Analysis of "the-binary"

May 27, 2002

Phase 1: Getting acquainted

The first thing I did after downloading the binary was to run 'strings' on it.  The output gave a pretty strong idea that this was a Linux binary (evidence: '@(#) The Linux C library 5.3.12')  Next, I loaded it up in IDA Pro ( on a Win2k box).  The binary was confirmed to be in ELF format, which caused me to get sidetracked reading up on the format of ELF binaries.  This being the first binary I have taken a serious look at, I wanted to do some learning along the way.  With the ELF specification at hand, I loaded the binary up in my favorite hex editor UltraEdit, and started poking around to satisfy myself that I understood the ELF specification.  This greatly assisted me in understanding all that IDA was telling me.

Phase 2: Understanding data flows

IDA does a fantastic job of analyzing a binary, creating labels and cross references during its initial analysis.  I elected to go after low hanging fruit first and tracked down all of the Linux system calls that I could find.  IDA had already labeled each one as to its purpose with comments such as 'LINUX - sys_socketcall'.  Using a Linux system call reference allowed me to start naming functions and identifying data types.  My goal was to rename as many functions according to their purpose, comment as many lines of code as I could to remind myself what I had discovered, define data structures based on known parameter requirements for the system calls, and lastly to start renaming local variables and parameters according to their purpose.  Having many named data types, variables and functions made understanding the code that much easier when I finally dug into the main function.  I generally elected to do a depth first search into the code in order to try to discover the purpose/data type of dat used as function parameters, as I feel the single most useful piece of the reverse engineering puzzle is knowing what kind of data is being manipulated.  It brings many other aspects of the code into proper context.

Phase 3: Code Analysis

With a reference sheet for x86 assembly language at my side, the next thing I did was dig into the code.  I also found it very useful to have access to the Linux man pages for all of the library functions that were being called.  Again this was invaluable in clueing me in to exactly what types of data was being manipulated.  The inclusion of so many system calls, led me to the conclusion (admittedly slow in the coming) that the binary was statically linked to all of its required library functions.  I found myself often frustrated with not following the logic behind some section of code, and having the gut feeling that I was attempting to reverse engineer the standard library.  The appearance of printf style format strings in the main function certainly suggested calls to functions such as sprintf.  In order to keep myself from drilling too far down into library code when I wanted to be focussing on the author's code, I loaded up a copy of the the sources for the libc 5.3.12 C libraries.  By comparing the code I was looking at in IDA with the source code, I was able to identify library functions much faster.  Grep helped me identify the gethostbyname function for example by searching the library for one of the strings used in the function.  This was a tremendous time saver and validation technique.  It allowed me to focus my efforts on the tool code.

Phase 4: Code Functionality

With many functions, and data types identified I started to focus on the tool code.  By stepping through the code in my head, it became clear that the tool attempts to hide itself by resetting argv[0] to "[mingetty]" and forking.  Next, following the closing of stdin, stdout, and stderr, a raw IP socket is opened to receive IP packets that use protocol 11 (NVP-II).  This caused a brief sidetrack to the RFC on NVP-II before I decided to let the binary tell me how it was using this protocol.

The tool then enters a loop to receive packets and perform tasks based on the received data.  IDA identified the switch table and labeled all of the cases, making life much easier.  There was only one function call between the recv and the switch.  The function was a bit more than I could follow at first, so I moved on to examining the cases.  Always one for an easy target, I scanned the cases for things that looked familiar.  Case 3 looked like it wanted to invoke the shell to run a command, based on the format string that it references.  The "rb" string that it references was a clue that it wanted to read a file and led to discovery of the fopen, fread, fclose and unlink functions.  A logical conclusion here was that if the toll wanted to read data following execution of a command, it was probably going to send that data back to the attacker.  This led to the discovery of the data transmission functions, and pointed out that some of the parameters necessary to send data to the attacker were only ever modified, and thus probably being initialized, in case 2.  Analysis of how data was used in this case started to reveal the structure of incoming packets.  One interesting programming flaw in this case is that results are sent back to the handler in 398 byte chunks, but random padding tacked on the end of each packet causes transmitted packets to range in size from 400 to 600 bytes.  The content of the padding is hardly random however.  Sloppy buffer allocation results in the padding bytes being taken from the unencoded command output which gets sent in an unencoded fashion.  This explains the appearance of plain text command output in the sample snort log privided by the Honeynet contest team as shown in the raw packet below: >  nvp 583
0x0000   4500 025b 9855 0000 fa0b b3b8 ac10 b702        E..[.U..........
0x0010   e960 2616 0300 89a4 bbf2 29b0 39bf 3dc6        .`&.......).9.=.
0x0020   3ec2 f986 028b 154c d35c e26d f32a 6198        >......L.\.m.*a.
0x0030   1fa5 2eb9 da11 487f b6fe 458c d31a 6198        ......H...E...a.
0x0040   cf06 3d86 bdf4 2bb6 30b7 ee25 5c93 db23        ..=...+.0..%\..#
0x0050   6ba2 d960 e66f fa7e f67d 0480 092a 6198        k..`.o.~.}...*a.
0x0060   cf06 4e95 dc23 6ab1 e81f 568d d60d 447b        ..N..#j...V...D{
0x0070   0782 0940 77ae e52d 75bd f42b b238 c14c        [email protected]+.8.L
0x0080   d048 cf56 d25b 7cb3 ea21 58a0 e72e 75be        .H.V.[|..!X...u.
0x0090   063d 74ab e22a 6198 cf5b d65d 94cb 024a        .=t..*a..[.]...J
0x00a0   91da 255c 9318 9b21 9b1d a11f a8c9 0037        ..%\...!.......7
0x00b0   6ea5 ed34 7bc2 0b53 8ac1 f82f 79b0 e71e        n..4{..S.../y...
0x00c0   aa25 ace3 1a51 99e0 2974 abe2 67ea 70ea        .%...Q..)t..g.p.
0x00d0   6cf0 6ef7 184f 86bd f43c 83ca 115a a2d9        l.n..O...<...Z..
0x00e0   1047 7ec6 fd34 6bf6 70f7 2e65 9ce4 2b74        .G~..4k.p..e..+t
0x00f0   bff6 2db2 35bb 35b7 3bb9 4263 9ad1 083f        ..-.5.5.;.Bc...?
0x0100   87ce 155c a5ed 245b 92c9 134a 81b8 43bd        ...\..$[...J..C.
0x0110   447b b2e9 3178 c10c 437a ff82 0882 0488        D{..1x..Cz......
0x0120   068f b0e7 1e55 8cd4 1b62 a9f2 3d74 abe2        .....U...b..=t..
0x0130   1961 98cf 0692 0d94 cb02 3970 c009 548b        .a........9p..T.
0x0140   c24c d74f da66 f011 487f b6ed 357c c30a        .L.O.f..H...5|..
0x0150   539e d50c 437a c2f9 3067 f26c f32a 6198        S...Cz..0g.l.*a.
0x0160   cf1f 68b5 ec23 ad38 b03b c751 7289 fd88        ..h..#.8.;.Qr...
0x0170   2ae3 b39a 98ad d91c 76e7 6f0e c491 7570        *.......v.o...up
0x0180   82ab eb42 b035 d184 4e2f 2736 5c99 ed58        ...B.5..N/'6\..X
0x0190   da73 23ea c8bd c9ec 2677 df5e f4a1 6540        .s#.....&w.^[email protected]
0x01a0   323b 5b92 e045 7204 0020 2070 726f 6772        2;[..Er....progr
0x01b0   616d 2076 6572 7320 7072 6f74 6f20 2020        am.vers.proto...
0x01c0   706f 7274 0a20 2020 2031 3030 3030 3020        port.....100000.
0x01d0   2020 2032 2020 2074 6370 2020 2020 3131        ...2...tcp....11
0x01e0   3120 2070 6f72 746d 6170 7065 720a 2020        1..portmapper...
0x01f0   2020 3130 3030 3030 2020 2020 3220 2020        ..100000....2...
0x0200   7564 7020 2020 2031 3131 2020 706f 7274        udp....111..port
0x0210   6d61 7070 6572 0a20 2020 2031 3030 3032        mapper.....10002
0x0220   3120 2020 2031 2020 2075 6470 2020 2031        1....1...udp...1
0x0230   3032 3420 206e 6c6f 636b 6d67 720a 2020        024..nlockmgr...
0x0240   2020 3130 3030 3231 2020 2020 3320 2020        ..100021....3...
0x0250   7564 7020 2020 3130 3234 20                    udp...1024.
The clear text output shown above is was most likely not intended to be exposed by the attacker.  Its encoded version is contained starting at byte 0x0018 of the packet.

Case 6 was the next case to fall.  Opening a backdoor listener was fortunately something I had seen done before. The code following the acceptance of an incoming connection revealed that the initial input had to match the string "TfOjG" (found in the strings output), after being upshifted by 1 position.  This revealed the backdoor password as "SeNiF".  Case 7 was next as a simple command execution case.  Case 8 was a bit trickier.  It was apparent that it was killing some process, but which one was the question.  Looking at the variables that it tested (which had to be a process id in order to be passed to kill) and seeing where it was set led to the initial conclusion that case 8 was used to close the backdoor listener.  It turned out that it was also used to terminate a variety of other actions that might be in progress as well.

I noticed the very similar code of cases 4, 5, 9, 10, 11, and 12, and elected to start by examining how each case set itself up prior to calling a function specific to each case.  The initial code in each case helped point out what case 8 was killing off.  When one of these 6 commands is received, the code checks to see if some "service" is already in progress.  A "service" being the actions performed by these cases as well as case 6.  If a service is already active, then these commands are silently ignored.  If no service is active, then each of these cases forks and the child becomes the new active service.  No more than one service process can be active at any given time.  All service processes, once activated continue indefinitely until terminated using command 8.  Following the fork, the child processes in each case manipulated the data received in the command packet in order to pass parameters to a case specific function.  Analysis of exactly how these parameters were manipulated was essential in understanding the command formats expected by the agents (see the answers section for a more detailed explanation of the format of each command).   Each of these six cases finishes with a call to a function that actually implements the case specific service.  A total of four separate service functions are called by the six cases, meaning that four distinct service are available for activation (in addition to the backdoor service).  Cases 4 and 9 called invoked the same service, and cases 10 and 11 invoked the same service.  While the activation commands have a slightly different format, it turns out that case 4 is simply a specialized version of case 9.  There is actually no need to include case 4 in the code.  Similarly, case 10 is a specialized version of case 11.  Cases 5 and 12 each perform unique services.

Phase 5: The Tool reveals itself

These four services turn out to contain the "attack" capabilities of the tool.  I am not an expert, so I will do my best to describe the capabilities of each of the services.

Analysis of case 4 (and 9) showed it to perform some form of DNS response flooding against a specified target.  Unfortunately this case also turned out to be the least straight forward to analyze because it references static data outside the function.  Determining what was going on was a result of looking at the socket setup and what data was sent in the resulting sendto call (the socket and sendto functions were known because of the earlier analysis of the Linux system calls made by the program).  An internet protocol, raw socket was being created in a specific message buffer.  Knowing that an IP header would be the first thing in the buffer, I defined an ip_header structure in IDA pro and overlaid it at the start of the message buffer.  At this point, then SANS "TCP/IP and tcpdump Pocket Reference Guide" became my best friend.  This greatly assisted me in identifying what information was being used to construct each message.  The key piece of information was the fact that a UDP packet was being built.  As a result, IDA was used to define a UDP header structure as well. The structure definition feature of IDA Pro proved to be invaluable in showing what types of data the program was manipulating.   As seen in the screen shot to the right, life is a lot easier when IDA is telling you exactly what data field is being manipulated.  The two sticky parts to this function were where the destination IP was coming from and where the data portion of the packet was coming from.  The destination IP was pulled from a block of initialized static memory, which upon further analysis turned out to be a block of 11444 IP addresses, presumably all DNS servers once the rest of the function is understood.  The data portion of each packet was being copied in from another initialized block of static memory, that happened to contain strings like "com", "org", and "edu" among others.  This was the tip-off that these were perhaps DNS queries and the list of IPs, DNS servers.  The query table contained 9 queries in all.  A humorous feature of this table discovered during the live testing phase was that 4 of the nine queries were malformed.  Once the looping structure of the function was understood, the basic algorithm turned out to be essentially this:

repeat forever
   if necessary, resolve target host name
   for each of the 9 queries
     send a DNS lookup to all 11444 servers, 
     spoofing the target IP as the source address
The result is that the target machine is flooded with DNS responses to queries that it did not send.  This type of attack is effective because DNS responses are generally allowed through firewalls.  Stateful firewalls that match incoming responses to valid requests should drop these incoming packets, having failed to see a previous matching request.  More details of this particular feature of the tool are available here.  It seemed apparent after analysis of this case that

Using similar analysis techniques, command 5 was discovered to be capable of sending either an icmp echo flood, or a udp flood.  The sloppiness of the original author revealed itself in this function.  First, he sets the fragment field of each packet to a non-zero value leading recipients to believe they have received a fragmented packet.  In the case of the icmp flood, which I assume was meant to be a ping flood, the target won't respond with an echo reply because they never receive all of the fragments (I learned this in the live phase).  The second sloppy feature of this function is that the author does not know how to compute the checksum of a udp header, so every udp packet sent has a bad checksum.  Again I found it somewhat funny that even if he had the proper checksum, the first thing he does after computing it is to change one of the data bytes in the packet.  This function simply loops forever sending either icmp or udp packets.  Whether it sends icmp or udp packets is selectable via one of the function's parameters, and can be specified by the agent's handler.  For more information on this command go here.

Command 10, perhaps the easiest of the cases to analyze, is a basic tcp SYN flooder.  Command 10 is a special case of command 11 and could have been omitted.  The analysis of this case was similar to commands 4, 5, and 9.  Understanding the data being passed to the socket and sendto functions help define the remainder of the function.  More information on this command is available here.

Finally, command 12 is a variation on the DNS flooder of command 4. Having deciphered command 4 made this case easier.  The table of predefined DNS queries was referenced, while the table of DNS server IP addresses was not.  All that remained was to analyze the loops.  This function is much simpler in that it directs all of the DNS queries to a server specified by the handler in the command that activates this function.  Because the agent can be told to randomize the source address from which the query originates, this command appears to be aimed specifically at flooding a target DNS server with requests.  The algorithm for this function loops indefinitely, sending each of the nine canned queries for each pass in the main loop. Command 12 is described here.

All of the DoS attacks execute a brief delay after sending each packet.  The intent of this delay is not immediately apparent to me, other than to allow the agent host a chance to receive incoming packets occasionaly, and thus receive an incoming command to terminate the attack.

Phase 6: Live Testing

Armed with what I felt was thorough knowledge of the workings of the tool, I moved into live testing on a test network.  My personal preference is to know exactly what to expect and have a plan to probe the tool looking for very specific responses.  I did not want to have to try to analyze network traffic and wonder what the tool was doing and why it did it.  My test setup consisted of only two computers, a prober, and a host for the-binary. I wrote simple packet generator programs whose goal it was to invoke each of the behaviors of the tool. These programs needed to make use of the network encoding scheme of the-binary.  That scheme is described on the answers page.  I used ethereal to monitor network traffic.  At this stage of my analysis, this was merely a validation exercise.  I expected to see, and did see each of the behaviors previously described.

Testing proceeded as follows

  1. ps -ax on agent host for reference
  2. netstat -a on agent host for reference
  3. lsof on agent host for reference
  4. launch ethereal on prober
  5. launch the-binary on agent host, no network traffic noted
  6. lsof on agent host for comparison
  7. ps -ax on agent host reveals new process [mingetty]
  8. From prober, issue command 2 to set transmit list, ethereal confirms command sent
  9. From prober, issue command 1 to query agent status, ethereal confirms 2 way communications
  10. From prober, issue command 6 to open backdoor, ethereal confirms packet
  11. ps -ax on agent host reveals new [mingetty] process
  12. netstat -a on agent host reveals new TCP listener on port 23281
  13. Using netcat: nc <agent host> 23281, root shell is obtained following entry of password "SeNiF"
  14. From prober, issue command 8 to close backdoor
  15. ps -ax on agent host shows only original [mingetty] process, nc session remains open
  16. From prober, issue command 3 to retrieve a list of active processes on the agent host.  Ethereal confirms transmission of the command and receipt of resulting data packets sent from agent to the prober.
  17. Decoder verifies that the packets newly received by the prober are in fact the results of a ps command issued at the agent host.
  18. Prober sends a kill command to be executed by the command 7 case of the-binary, ps on the agent host verifies that the command executed properly.
  19. Prober sends commands to initiate the flood attacks described above.  Ethereal confirms the transmission of these flood packets.
  20. Use command 8 to terminate each of the flooding attacks.

Things learned from live testing:

  1. The icmp and udp packets of attack 5 contained non-zero fragment numbers and the target failed to respond to the attempted ping of the ping flood.
  2. The udp packets of attack 5 are not checksummed properly.
  3. 4 of the 9 canned DNS queries for attacks 4, 9, and 12 were malformed (those for "ie", "es", "de", and "gr", all european counrties???).
  4. Agent/Handler communications packets (IP protocol 11 packets) sailed right through a RedHat 7.3 "high security" firewall configuration as selected during the RedHat installation process.  Full control of the agent was possible with the single exception of being able to connect to the backdoor listener.  This was easily remedied by using the agent capabilities to shutdown the firewall.
  5. The agent can be detected using the lsof command:
  6. # lsof | grep raw
    the-binar 11299 root    0u   raw               244837 00000000:000B->00000000:0000 st=07
    # lsof -p 11299
    the-binar 11299 root  cwd    DIR    3,6   4096      2 /
    the-binar 11299 root  rtd    DIR    3,6   4096      2 /
    the-binar 11299 root  txt    REG    3,6 205108 257470 /root/honeynet/the-binary
    the-binar 11299 root    0u   raw               244837 00000000:000B->00000000:0000 st=07
  7. The agent was also detectable using two different forms of the ps command:
  8. # ps -x | grep mingetty
     1071 tty1     S     0:00 /sbin/mingetty tty1
     1072 tty2     S     0:00 /sbin/mingetty tty2
     1073 tty3     S     0:00 /sbin/mingetty tty3
     1074 tty4     S     0:00 /sbin/mingetty tty4
     1075 tty5     S     0:00 /sbin/mingetty tty5
     1076 tty6     S     0:00 /sbin/mingetty tty6
    11299 ?        S     0:00 [mingetty]
    # ps -e | grep 11299
    11299 ?        00:00:00 the-binary

Phase 7: Just for the hell of it

A true test of understanding is to duplicate the original, so just for kicks, I wrote a functional equivalent to the-binary in C and tested it out against the above procedures.  The source code for the reverse engineered version is available here the-binary.c. I also wrote scanner.c that scans a class C network for running instances of the-binary and lists any machines on which it is found, killer.c connects to a running agent and kills it off.