Reverse Engineering of the Honeynet’s SOTM32 Malware Binary
Vinay A. Mahadik
This report analyzes the “Rada.exe” binary as part of the Honeynet’s “Scan of the Month #32” challenge. The analysis attempts to use reverse engineering techniques that are as generic as possible in order to predict, with a good degree of confidence, what the original binary actually does. Manual byte-by-byte disassembly-analysis towards a decompiled source code mapping is intentionally avoided wherever possible.
The Unix file utility found the RaDa.exe as MS-DOS/Windows executable. Hiew shows the “MZ” and “PE” signatures present in Portable Executable (PE) headers.
Using Sysinternal’s Unicode/ASCII strings search on it found, among few other strings, the following leads:
All these suggest that it’s likely a MS Visual Basic 6.0 application.
Next, we use the IDA Pro disassembler to analyze the binary. IDA confirms the binary as a PE executable. It suggests loading the binary manually since the Import Address Table (IAT) “is located in non-standard sections”. Loading it with the default options finds absolutely no imported symbols, which would make the analysis extremely difficult.
We reload the binary with the “Manual load” option selected and “Make imports segment” option unselected. We load all sections, and ask IDA to load the imports segment in its raw form.
This leads to a friendlier Names windows:
The disassembly looks intimidating at first. Based on the unconventional/modified imports segment, we suspect the binary has been intentionally packed/compressed to deter reverse engineering. We quickly locate the jumps (call, retn, jmp instructions) in the start routine. We find towards the end of the start routine, 3 call instructions followed by a jmp to outside the entry (JDR1) section. This is quite typical of executable packers. This reinforces the theory that what we see at the start routine is a piece of code added by an exe packer that first decompresses the executable at run-time, and once done, jumps to the Original Entry Point of the uncompressed executable’s image in memory. That is, 0x4018A4 is our OEP, and when EIP is at 0x40FE78, the exe must’ve been uncompressed in memory ready for easier reversing:
Before we proceed to unpacking the binary, let’s take a brief look at some of its important PE header characteristics. It’s a good idea to draw out the memory (and disk) images of the exe, say on a piece of paper, to be able to visualize the unpacking process. We will not attempt any fancy artwork here in the interest of time. We use “dumpbin /all Rada.ex_” and find the following relevant finds:
Aha, JDR0 is uninitialized to begin with (0 bytes on disk), but is still marked X (for executable). Besides, its memory image is much larger than JDR1 in size. We also know that JDR1 has a jmp call to JDR0. The executable must be getting decompressed into the JDR0 section at run-time!
Let’s unpack the binary next.
Typically, if the packer used to compress the binary is known, and the packer supports decompression, it’s a good idea (faster!) to use it to unpack the binary. We’ll intentionally avoid doing that, and look at how most compressed binaries can be unpacked generically without any prior knowledge about the packer.
We use Hiew to ‘patch’ the exe at the “jmp OEP” point with the hex bytes “EB FE”. These bytes correspond to a reverse short jump to the instruction being pointed to by EIP. In our case, it is interpreted as “jmp 0x40FE78”. The executable will thus enter an infinite loop once it reaches this instructions sequence. The idea is to execute the exe, and somehow pause the process at the point where it has completely decompressed itself, and is about to jump to the OEP. Let this modified exe be called EBFE.exe. At this point, we capture a snapshot of the memory image of the process and dump it to disk as “img_dump.exe”. This last is the uncompressed counterpart of the original packed exe, and lends itself much more easily to reversing. (The other alternative would be reverse the image inside the memory itself, but the IDA analysis would be too volatile to be useful):
There are quite a few utilities out there that allow you to dump memory images to disk (procdump, icedump etc). However, in order to show how simple that is under the hood, I have provided a small Win32 utility (ReadProcMem.cpp) along with this analysis. At the heart of this tool are the Win32 API calls:
HANDLE hProcess = OpenProcess(PROCESS_VM_READ, FALSE, (DWORD)process_pid);
ReadProcessMemory(hProcess, (LPCVOID) base, buf, size, &bytes_r);
where, process_pid is the PID of the process to be read(later dumped to disk with WriteFile etc), base is the image base to start reading from (0x400000 in our case), size is the number of bytes to read (image size 0x11000 in our case), and bytes_r are the number of bytes actually read.
As explained then, we execute the modified (EB-FE –ed) executable (on an isolated test box) and let it run. We use the Task Manager to find its pid. We execute the ReadProcMem tool as “ReadProcMem <pid> 400000 11000” to get the “img_dump.exe” dropped into the current directory. This is the uncompressed version of the Rada.exe.
Next, we need to fix certain areas of the img_dump.exe to make it just like a regular PE executable.
We have to make sure that the loader is able to make sense out of the PE headers/fields of the img_dump.exe file, and can load the executable into its VM space fine, and get the EIP to point to the OEP (0x4018A4).
To fix the dump file, we use PE Tools’ PE Editor and teshp’s Revirgin utilities. All of the following can be done by hand with a hex editor of course, but PE Editor/Revirgin provide a nice intuitive interface that speeds this process up by orders of magnitude!
- Fixing PE Headers: Size of code = 0xB000 (JDR0, up from 0x4000), size of uninit data = 0 (JDR0, down from 0x4000), code-base RVA = 0x1000 (from 0xC000).
- Fixing Entry Point: EP RVA = 0x18A4 in JDR0 (from 0xFD20 in JDR1)
- Fixing Sections’ Info: JDR0’s raw size = 0xB000 (on disk, up from 0 bytes), raw offset = 0xB000 (from 0). JDR1’s raw offset shifts to 0xC000 (from 0x400) to make space for the new JDR0 section. .rsrc’s raw size = 0x1000 (up from 0xE00), raw offset = 0x10000(from 0x4400). PE Editor’s ‘Dumpfixer’ right-click option is the quickest way to do just this.
Also, the Import Directory and the IAT need to be valid for the right DLLs to be loaded, and any external calls to work. The binary still has the Import Directory starting at 0x410BA4, still pointing to the kernel32.dll and msvbvm60.dll DLLs, and still located within the .rsrc section. Tracking down the right IAT is easy. We open the binary as it is now in IDA again (manual load, don’t make imports segment). Then we trace the code beginning from the new entry point 0x4018A4 (start). We reach sub_40189C which has a “jmp ds:dword_40119C” instruction. If we go to that address, we see a series of memory addresses outside the process’s sections’ address space (beyond JDR0, JDR1, .rsrc). This just has to be the IAT:
Scrolling up, we see that it starts at 0x401000. It’s bound, meaning the addresses (filled in by the loader) are the ‘actual’ virtual addresses of the functions imported from DLL(s). To help in reversing, let’s recreate a separate .idata section with unbound addresses. This is easy with Revirgin. Execute the EBFE.exe (the one that executes and pauses at jmp OEP). Then launch Revirgin. Select EBFE.exe in the process list. We know the IAT Start RVA is 0x1000, and the size as seen in IDA is 0x20C. Enter these in Revirgin, and let it process the bound IAT. It shows the found symbol names in the right window. Choosing “unresolved” should return an empty list. It does in our case. Now we need to save this symbol table to disk. Click on “IAT Generator”, and specify the dumped-file first (img_dump.exe). It locates the start address available to create the .idata section (0x411000 in our case). Save the table as “idata.bin” say. It should now be clear why doing this manually with a hex editor would have taken a long time! :
Then do a “copy /B img_dump.exe+idata.bin unpacked.exe” to get the exe with the recreated .idata section. Make sure the image size get’s fixed (to 0x11B16 in our case).
Now to confirm that we fixed the dumped exe fine, load the fixed version in IDA (without manual load, and with the make import segments options – the default options). Not only does IDA not complain about the IAT being in a non-standard section, it loads the newly created .idata section and resolves all symbol references just fine:
With an unpacked executable now, we are ready for reversing the binary to know what it does. We’ll use a combination of passive/static and active/run-time analyses. This is the way to go really. Static analysis of the disassembly is more thorough since we are looking at the code source, however run-time analysis helps answer certain questions that are not easily found via a purely passive approach.
From IDA’s Imports view, it’s clear that the exe relies heavily on the MSVBVM60.dll Microsoft Visual Basic 6.0 runtime DLL. It must be a Visual Basic 6.0 application.
The entry point begins with a call to the ThunRTMain() routine within the runtime DLL. This is the standard VB startup routine. It takes in a pointer to a structure that supposedly contains information about the process required for startup. The internals of the routine, and the structure are not documented. It’s really not that relevant to our purpose anyway. We just need to know where this routine hands-off execution back to one of our subroutines. For this, we need to first find if it’s compiled as p-code or native.
We quickly create two simple VB Projects, both with a “Sub Main” subroutine (in a separate module .bas file, not under any Form) as follows:
Set both the project’s to use Sub Main as the Startup Object (Project Properties > General Tab). Compile one as p-code, and the other as native. Analyze the imports in IDA. The p-code based exe requires the ProcCallEngine p-code interpretation engine. This is missing in the native compiled code. For that reason, we conclude that our exe is also compiled to native code.
Now, we take the native-code compiled sample exe, and study the structure being passed to the ThunRTMain() routine. Study the following screenshot:
We had previously found the function at 0x4015B0 to correspond with the “Sub Main” routine. We can clearly see the address of this function being passed to the ThunRTMain routine. Is this hunch correct? A more thorough confirmation would require either a run-time check (by setting breakpoints on each subroutine, and seeing if fMain is really hit first) or a thorough reversing of the “InfoStruct” structure based on whatever information is available about runtime DLL. We chose to skip this rigorousness, and proceed based only on this hunch. We compare this with the structure being passed to the ThunRTMain() routine of our unpacked.exe and find the startup routine as 0x4045D0. Let’s rename this as fMain:
The IDA Navigation bar gives the size of functions’ code relative to the size of the rest of the exe. This gives an idea of the amount of reversing needed; pretty much half of the binary!:
In the first steps, we look at low hanging fruit basically, rapidly labeling as many subroutines as possible based on quick superficial inspections, commenting lines, defining data structures if required, renaming variables etc and getting acquainted with the disassembly.
While getting familiar with the source, I found these really useful articles on analyzing VB based virii/worms, “VB: Wearing the inside out” and “Generic Detection of Visual Basic Internet Worms”. Accordingly, we digress a little and study 3 different cases of COM object bindings: Early binding using Set-CreateObject, Early binding using New, and Late binding using Set-CreateObject. Then we analyze their disassembly for patterns.
Here’s a sample we could use (it’s an old mass mailing payload sample):
' Early Binding with Set-CreateObject
'Dim appOutlook As Outlook.Application
'Set appOutlook = CreateObject("Outlook.Application")
' Early Binding with New
'Dim appOutlook As New Outlook.Application
' Late Binding with Set-CreateObject
Dim appOutlook As Object
Set appOutlook = CreateObject("Outlook.Application")
Set mapiNameSpace = appOutlook.GetNameSpace("MAPI")
Set objMail = appOutlook.CreateItem(0)
Set objFolder = mapiNameSpace.GetDefaultFolder(10)
For Each objContact In objFolder.Items
If objContact.Email1Address <> "" Then
objMail.To = objMail.To & ";" & objContact.FullName
objMail.Subject = "Subject of mail message"
objMail.Body = "Body of mail message"
For Early binding with Set-Object, we find the closely spaced sequence “push offset aOutlook_applic ; "Outlook.Application"”, “call ds:rtcCreateObject2”, and “call ds:__vbaObjSet”.
For Late binding with Set-CreateObject, we find likewise: “push offset aOutlook_applic ; "Outlook.Application"”, “call ds:rtcCreateObject2”, and “call ds:__vbaObjSetAddref”.
Both of these have the COM object string pointer pushed onto the stack just before the rtcCreateObject2 call. This makes detecting such COM object accesses pretty trivial.
For Early binding using New, we find “call ds:__vbaNew2” without any reference to the interface type at this point. We do see a pointer to the GUID of the interface being pushed on the stack as an argument of the following “call ds:__vbaHresultCheckObj” call. In this case, “_Application” interface has a GUID of 00063001-0000-0000-C000-000000000046.
We apply these techniques to the unpacked.exe binary. We quickly find accesses to “Scripting.FileSystemObject”, “Wscript.Shell”, “InternetExplorer.Application”, “ADODB.Recordset”, “ADODB.Stream” COM objects across various functions.
Next, we perform a depth-first analysis of code beginning at the root node (fMain identified as the startup routine of this VB application). We focus mostly on key or/and unique functionalities of the binary ignoring supporting code if any. In the interest of brevity, we’ll skip the details of this analysis. We use the IDA’s integrated graphing capabilities to generate a Xrefs graph starting at fMain:
This graph gives a fairly good idea of how the various routines fit together. Let’s pick a few of the important ones and explore them in more detail.
The binary takes the following command line arguments (fCmdArgs routine):
- --verbose: A decoy, a string "Starting DDoS Smurf remote attack..." is created and freed.
- --visible: The InternetExplorer.Application objects are left visible.
- --server: The target server’s IP address.
- --commands: Specify the commands filename.
- --period: Gap durations between each connect to the target server.
- --help: Internet Explorer Help popup.
- --gui: A small gui interface for the application.
- --authors: Display credits.
Here we analyze a few intriguing behaviors the binary is involved with.
Analyzing the fOpenIERCmds() function, we find that it starts of with assembly that maps roughly with:
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = 0
This means, the process opens up an Internet Explorer object, makes it invisible, and navigates it to the “RaDa_commands.html” file on the 10.10.10.10 server. It then searches for a Form component in the HTML document. Once found, it looks for an item with name as “exe”. If that’s present, the value of this item is passed as an argument to the fComSpec function.
If we look inside fComSpec, it essentially does a “%COMSPEC% /c Value” where Value is the string passed to the function by fOpenIERCmds().
Adding these finds, we predict that if we place an entry like:
The binary will attempt to execute ‘%COMSPEC% /c del c:\somefile.txt’ on the local system . The %COMSPEC% variable is provided so that an user could substitute the command.com or cmd.exe command interpreter. On my system, this translates to “C:\WINDOWS\system32\cmd.exe /c del c:\somefile.txt” which deletes somefile.txt if present from the C: drive! The commands are be arbitrarily chosen of course.
Likewise, the screenshot item is present invokes the fScrnShot() routine. The routine’s purpose becomes clear when we look at the following screenshots:
The fDllKeybdEvent() routine is called with 0x2C as an argument.
The routine checks if eax is preloaded with the keybd_event() functions address (imported from user32.dll). If yes, it calls the function directly with 0x2c as the event parameter.
Else, DllFunctionCall is used to call this function indirectly by loading user32.dll. Checking the syntax of the keybd_event() function and the Virtual-Key Code table, we find that 0x2C code corresponds with VK_SNAPSHOT, or the “Print Screen” key. The fScrnShot() routine then saves this screen-shot to the “C:\RaDa\Tmp\spy.jpg” file where the filename was provided “screenshot” Form item of the “RaDa_commands.html” commands file.
This screenshot could’ve very well been seen over the wire back to the command server. The utility could very easily be modified into a spy-ware thus.
These confirm the trojan/spy-ware behavior of the binary.
Analysis of the fChkMAC function shows that the binary first attempts to check any of the network devices have a MAC address that equals any of 00:0C:29:XX:XX:XX, 00:05:69: XX:XX:XX, and 00:50:56: XX:XX:XX which are the IEEE registered MAC address ranges for VM Ware. Likewise, it also checks the Windows registry key: “HKLM\Software\VMware, Inc.\VMware Tools\Inst” to detect a VM Ware installation. It exits immediately if VMWare is found installed.
We have seen a systematic reverse engineering of the given binary. From the point where we couldn’t even see the imported symbols, we have reversed most of the signature functionalities of the executable. The few that remain can similarly be reversed by manual analysis of the corresponding routines. Further, the techniques were chosen to be as generic as possible so that they could be applied to other similar malware binaries.
2. Virus Bulletin’s “VB: Wearing the inside out” Article
3. Virus Bulletin’s “Generic Detection of Visual Basic Internet Worms” Article
4. AndreaGeddon’s “Visual Basic Reversed”