A first look at somefile didn't show any structure. It didn't even show any printable character.
So I wrote histogram, a small perl script to count the occurances of each single value. The histogram for somefile reveals some interesting facts:
A sophisticated algorithm would distribute the encrypted characters equally over the whole character space from 0 to 255. This one leads to only 45 different characters, which is even less than the count of uppercase and lowercase letters. That's why I expect it to be a simple substitution chiffe.
Assuming somefile to be an english text, I calculated the frequencies for a README file and, out of curiosity, for a more structured configuration file (/etc/passwd) for comparison.
|english text||40-58, 65-122||10, 32|
As expected, there's no character at values above 128. The most frequent character in an english text is the blank (32), separating the words. The most frequent character of /etc/passwd is the colon (58), separating the fields.
Just as i started to compare the frequecies of letters in a first attempt to build the decoding table, I realized the peak at character 10 in the histograms of the README and passwd. Character 10, that's linefeed! At the same Moment I rememberd the peak at character 245 in the ciphertext's histogram. The symmetry is obvious:
|plain text||cipher text|
|uses character range 0-127||uses character range 128-255|
|peak at the 11th character,
counted from the low end
|peak at the 11th character,
counted from the high end
10 = 255-245 looked like the solution. I assumed, the decryption algortithm has to subtract the value in cipher text from 255 to calculate the value in plain text. In other words, it just has to invert every single bit. Obviously, the same algorithm can be used for encryption and decryption. Applied for the first time, it turns plain text into cipher text. Applied for the second time, it reverses the operation.
I wrote decrypt, another small perl script, to prove my assumption:
decrypt -i somefile -o somefile.plain.txt
decrypt bytewise reads an input file, inverts every bit and appends the byte to an outfile. I used an awkward expression to invert the bits. It reflects the steps leading me to the encryption/decryption algorithm:
Well, XORing with 0xFF would do the same, saving a lot of machine cycles :)
The output appeared as readable text, looking like a configuration file.
The decrypted file looks like a configuration file for the Universal Root Kit (URK) by K2 (sample).
The configuration file contains the password opening the backdoor in clear text, which is reason enough to encrypt it. Another reason might be to hide the location of modified system binaries after detection of the rootkit.
URK moves some system binaries like find and ps into the rootkit directory while it installs wrappers at the original place. These wrappers, which K2 calls "filters" are calling the saved system binaries and suppress parts of the returned information, based on the configuration file.
The configuration file is in format known from Windows INI-files. The file is divided into sections, whose names appear in angle brackets. Parameters are given in the format parameter=value.
A lot of parameters contain pointers to the original system binaries, e.g.
Other parameters, named program_filter, contain strings to be filtered out by the trojaned binary.
According to K2's README, the filters for find, du and ls share the same filter list, name file_filters. Therefore, files and directories containg the string "01" won't show up in ls output. Some postings to sun newsgroups and mailing lists reported this behaviour. Files named "uconf.inv" won't show up, too.
Likewise, ps_filters contains the names of processes to suppress from the process listing. Among them sshd2, which might be a trojaned ssh daemon.
The config file contains a filter expression for lsof, but doesn't specify the location of the saves system binary. If properly installed, it would hide connections to or from ports 13000, 25000, 6668 and 6667 and the files uconf.inv (again) and psniff. Ports 6668 and 47018 will be suppressed in netstat output.
Even a trivial encryption scheme can significantly slow down the examination of a file and hinder the investigation of a break in.
It took me about 3 hours to decrypt the file, 2 hours to search for a specimen of the rootkit (see bonus question) and 3 hours to turn my notes into this writeup.
Total: about 8 hours
As stated above, the config file format looks like the one used by K2's URK. On the other hand, URK doesn't use an encrypted config file, at least up to version 0.9.8, which is the last one I found. Also, URK doesn't contain a filter for lsof.
Dissatisfied, I used Google to search the web and in newsgroups for significant strings like.
The search for "/dev/pts/01/" directly leads to CA-2001-05. This warning lists the files of a rootkit found on the compromised hosts, a file named crypt among them.
Another document refers to said CERT advisory. It describes a rootkit, whose config file contains the strings "lsof_filters" and "ps_filters". The kit identifies itsels as "X-Org SunOS Rootkit v2.5DXE".
Another report again mentions the known filter strings. There's also a pointer to a file "/dev/pts/01/uconf.inv". This rootkit identifies itself as "SunOS Rootkit v2.5 (C) 1997-2001 Tragedy/Dor".
A report on sans.org gives an in-depth description of a compromise. Again it mentiones uconf.inv. It contains hexdumps of two encrypted files, looking like somefile and encrypted by the same trivial algorithm. crypt is among the files listed:
-rwx------ root/lp 3648 2000-11-25 14:46:14 ./.config/crypt
Last, but not at least there's an analysis by Bruce Ediger, posted to comp.security.unix on 2001-06-07 at 07:02:02 PST. He describes a file named "uconf.inv" encrypted by inverting all bits. He identifies the rootkit as a combination of K2's URK and the SunOS Rootkit by Tragedy/Dor.