Honeynet Project Scan of the Month - Scan 24 (October 2002)
Submission by Eloy Paris <peloy at chapus dot net>
Thu Oct 24 19:00:17 EDT 2002
Eloy Paris Computer Forensics, Inc. peloy at chapus dot net +23-818-343-3443 October 24, 2002 Narcotics Division Honeynet Police Department Dear Officers, I am writing in regards to your request for help in the narcotics case against Mr. Joe Jacobs. I am pleased to inform that I have finished my investigation of the floppy disk seized from the suspect's house. This investigation yielded additional details that might help the on-going investigation and case against Mr. Jacobs. Please find attached my report and please do not hesitate to contact should you need further information. Sincerely, Eloy Paris.- Computer Forensics Investigator
Computer Forensics Report
Table of Contents
I found three files in the image of the floppy that was seized from the suspect's house. The files were concealed using simple methods like deleting files (in one case), altering the File Allocation Table (FAT) and root directory entries, and by changing file names. The three files are: 1) a Microsoft Word document, 2) a Microsoft Excel spreadsheet, and 3) a JPEG image file. The Excel file was compressed and encrypted with a ZIP program but, fortunately, the password was easy to recover from the floppy disk image. These three files should give the Honeynet Police Department they information it needs to have a more solid case against the suspect and his possible accomplices.
Question 1. Who is Joe Jacob's supplier of marijuana and what is the address listed for the supplier?
Joe Jacob's supplier seems to be a dealer called Jimmy Jungle, and his address seems to be:
626 Jungle Ave. #2 Jungle
The name of Joe Jacob's supplier can be obtained from two documents: first, a deleted Microsoft Word document on the floppy is a letter from Joe Jacob to his supplier. In this letter Mr. Jacob compliments his supplier for the quality of the pot the supplier is selling to Mr. Jacob and thanks him for the advice of targeting high school students. The letter is clearly addressed to Jimmy Jungle. The actual letter can be seen in Figure 1.
Figure 1. "Jimmy Jungle.doc"
The second place where we can see the name of Joe Jacob's supplier is the JPEG file that is hidden in the floppy disk image (see Figure 2.) The graphic is some kind of ad probably used by Joe Jacob to advertise the marijuana he is selling. In the ad we can see that Jimmy Jungle is presented as the producer and supplier of the marijuana sold by Joe Jacob.
Figure 2. "cover page.jpg"
Question 2. What crucial data is available within the coverpage.jpg file and why is this data crucial?
There are two important things within the JPEG file: the first one was discussed in question #1 and is the name of the Joe Jacob's supplier. This could be used in court as evidence against both Joe Jacob and Jimmy Jungle.
The second thing is perhaps more important because it helps to recover strong evidence against Joe Jacob: in the last cluster occupied by the JPEG file there is an ASCII string that reads "pw=goodtimes". As we will see in the next question, this string represents the password that is used to encrypt and compress a Microsoft Excel file that contains Joe Jacob's schedule for visiting high schools where he sells the marijuana.
Now, it is not possible to know whether the string "pw=goodtimes" is part of the JPEG file or if it is just stored in the unused part of the last cluster occupied by the file. If we believe the file size indicated in the root directory of the floppy then the string is stored in the unused part of the last cluster of the file.
However, I found out that both the FAT table and root directory entries were manipulated to conceal important information. So, it is possible that the string "pw=gootimes" was made part of the JPEG file and then the root directory entry was changed (the file size, in particular) to make it look that the string is not part of the file, when in reality it is.
Please note that the graphic can be seen even if this extra "garbage" (the "pw=goodtimes") is part of the file because programs that display JPEG graphics seem to ignore the extra garbage at the end of JPEG files.
Question 3. What (if any) other high schools besides Smith Hill does Joe Jacobs frequent?
The file "Scheduled Visits.xls" (a Microsoft Excel spreadsheet) contains what seems to be a list of high schools visited by Joe Jacobs. There are six high schools in the list:
The file also contains (as one can guess from the name of the file), a schedule of visits to each of the schools. The schedule is pretty simple: high schools are visited in succession, one after the other, and one per day. Visits take place from Monday to Friday. The schedule in the file covers the months of April, May and June.
Figure 3 shows how the file looks like.
Figure 3. "Scheduled Visits.xls"
Question 4. For each file, what processes were taken by the suspect to mask them from others?
As I previously mentioned, I found three files in the floppy disk image. The files were concealed so none of them could be read without utilizing first some recovery techniques which I will explain in the next section. I will explain now what was done to each file to conceal it and make it unreadable:
This file is a Microsoft Word document. Its contents can be seen in Figure 1. Two things were done to this file to prevent anyone from seeing its contents:
Other than this, nothing else was done to the root directory entry of this file.
This file is a JPEG image, and it can be seen in Figure 2. The things that were done to this file to conceal it were:
This file is, as I previously mentioned, a Microsoft Excel spreadsheet. The file was compressed and encrypted with the Zip utility. This file had the most obfuscation artifacts:
In the next section I will explain what counter-measures were applied to the FAT and the root directory to bring the files back to the world of the living.
Question 5. What processes did you (the investigator) use to successfully examine the entire contents of each file?
To analyze the floppy disk image I took two approaches. The first approach was simple and the goal was to get results as quickly as possible. The second approach was more complex, and the goal was to get more precise answers.
The first thing that I did after obtaining the disk image was to determine what kind of file system was present on it. The Unix command "file" gave me the hint:
[email protected]:~$ file image image: x86 boot sector, system MSDOS5.0, FAT (12 bit)
So the "file" command told me that I was dealing with a MS-DOS file system with a File Allocation Table (FAT) that has 12 bits per entry.
Armed with this new information, I proceeded to open the disk image with a binary editor. Since I already new the type of file system used on the disk, it was very easy to determine were to look for the different pieces of the puzzle. For example, for a MS-DOS file system that uses FAT12, the disk layout is as follows:
Figure 4. Layout of a MS-DOS FAT12 Disk (see References for source)
Since disk sectors have a fixed size of 512 bytes, it is very easy to move with out binary editor to the different sections of the disk. All we have to do is to multiply 512 by the sector number we are interested in analyzing. For example, the root directory is located at offset 512*19 = 9728 = 0x2600 (from the beginning of the disk image.) The data section of the disk starts at offset 512*33 = 16896 = 0x4200.
For a quick analysis all that is needed is to determine how many files are stored in the disk, and then, hoping that the files are stored in consecutive clusters, go to the data section of the disk and retrieve each file by just looking at where they might start and end. Let's take a look.
The root directory of the disk looks like this:
00002600: e564 006f 0063 0000 00ff ff0f 00bc ffff .d.o.c.......... 00002610: ffff ffff ffff ffff ffff 0000 ffff ffff ................ 00002620: e54a 0069 006d 006d 0079 000f 00bc 2000 .J.i.m.m.y.... . 00002630: 4a00 7500 6e00 6700 6c00 0000 6500 2e00 J.u.n.g.l...e... 00002640: e549 4d4d 594a 7e31 444f 4320 0068 3846 .IMMYJ~1DOC .h8F 00002650: 2b2d 2b2d 0000 4f75 8f2c 0200 0050 0000 +-+-..Ou.,...P.. 00002660: 4267 0063 0020 0020 0020 000f 00f4 2000 Bg.c. . . .... . 00002670: 2000 2000 2000 2000 2000 0000 2000 2000 . . . . ... . . 00002680: 0163 006f 0076 0065 0072 000f 00f4 2000 .c.o.v.e.r.... . 00002690: 7000 6100 6700 6500 2e00 0000 6a00 7000 p.a.g.e.....j.p. 000026a0: 434f 5645 5250 7e31 4a50 4720 006d 4d46 COVERP~1JPG .mMF 000026b0: 2b2d 2b2d 0000 da43 2b2d a401 e13c 0000 +-+-...C+-...<.. 000026c0: 4269 0074 0073 002e 0065 000f 009e 7800 Bi.t.s...e....x. 000026d0: 6500 2000 2000 2000 2000 0000 2000 2000 e. . . . ... . . 000026e0: 0153 0063 0068 0065 0064 000f 009e 7500 .S.c.h.e.d....u. 000026f0: 6c00 6500 6400 2000 5600 0000 6900 7300 l.e.d. .V...i.s. 00002700: 5343 4845 4455 7e31 4558 4520 0053 5346 SCHEDU~1EXE .SSF 00002710: 2b2d 2b2d 0000 9042 b82c 4900 e803 0000 +-+-...B.,I.....
Here we can see that there are three files. The first one, "Jimmy Jungle.doc", has been erased (notice the 0xa5 byte at offsets 0x2600, 0x2620 and 0x2640.) The second file is called "cover page.jpgc " (notice that is has several spaces after the file extension, but it gives us an idea that we might be dealing with a JPEG image.) And finally, we see that there is a third file called "Scheduled Visits.exe". By just looking at the file name it appears to be an executable, but we can't be sure unless we look at the data.
To recover this files without spending much time on it, we just need to go to the data section and try to find out were they start and end. We do this by looking at the first few bytes of the files. For example, for the first file we have:
00004200: d0cf 11e0 a1b1 1ae1 0000 0000 0000 0000 ................ 00004210: 0000 0000 0000 0000 3e00 0300 feff 0900 ........>.......
The ".doc" extension of the file name suggests that this is a Microsoft Word document. If we see the first few bytes of a normal Word document we see that the first few bytes are the same. So, we can safely assume that we have found the beginning of a Word document at offset 0x4200.
Now let's look for the end: if we scroll down in our binary editor (I am using Emacs' hexl mode, by the way) we notice the string "JFIF" around offset 0x9200.
00009200: ffd8 ffe0 0010 4a46 4946 0001 0101 0060 ......JFIF.....` 00009210: 0060 0000 ffdb 0043 0008 0606 0706 0508 .`.....C........
This led me to think that the second file, the JPEG image "cover page.jpg" starts there (this is confirmed by the fact that the word 0xffd8 at offset 0x9200 is what is called SOI, or Start Of Image, in the JPEG file format definition.) I then proceeded to save to disk all bytes between offset 0x4200 and 0x91ff. I gave to the file the name "Jimmy Jungle.doc". Finally I used AbiWord to open the file and was able to find the information I mentioned in Question 1.
The end of the first file is the beginning of the second, so I then needed to find the end of the second file. For this, I scrolled down with my binary editor until I found the string "PK" at offset 0xd000.
0000d000: 504b 0304 1400 0100 0800 985a b72c c755 PK.........Z.,.U 0000d010: 608d ea08 0000 0042 0000 1400 0000 5363 `......B......Sc 0000d020: 6865 6475 6c65 6420 5669 7369 7473 2e78 heduled Visits.x 0000d030: 6c73 94c8 312a e349 0bdb a810 c270 9dfc ls..1*.I.....p..
This sounds familiar; it is the signature of a file compressed by PKZIP. So using the same procedure I used with the first file, I saved to disk all bytes from offset 0x9200 to offset 0xcfff. The name I gave to this file was "cover page.jpg". I opened the file in The GIMP and got the image that can be seen in Figure 2.
For the third and final file, we just scrolled down until I found the string "PK" again. I looked for this string because it signals the end of a ZIP file. I found the string around offset 0xd950. So I saved to disk all data between offset 0xd000 and 0xd950 and gave to the new file the name "schedule.zip". Note that I didn't use the ".exe" extension because after looking at the data I know it is a ZIP file and not an executable.
0000d940: 0000 2000 b681 0000 0000 5363 6865 6475 .. .......Schedu 0000d950: 6c65 6420 5669 7369 7473 2e78 6c73 504b led Visits.xlsPK 0000d960: 0506 0000 0000 0100 0100 4200 0000 1c09 ..........B..... 0000d970: 0000 0000 0000 0000 0000 0000 0000 0000 ................
Now, this file is a zipped (compressed) so I needed to unzip it. For this I used the unzip Unix command, but as soon as I ran the command I got prompted for a password. I did not know the password originally, but when I was looking for the end of the second file ("cover page.jpg") I noticed the string "pw=goodtimes" around offset 0xcf20:
0000cf10: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000cf20: 7077 3d67 6f6f 6474 696d 6573 0000 0000 pw=goodtimes.... 0000cf30: 0000 0000 0000 0000 0000 0000 0000 0000 ................
I could only guess that the password was "goodtimes", so that's what I tried:
[email protected]:~$ unzip schedule.zip Archive: schedule.zip [schedule.zip] Scheduled Visits.xls password: inflating: Scheduled Visits.xls
Bingo! It worded, so I then opened the file "Scheduled Visits.xls" with Gnumeric and got the information I presented in Question 2.
So, as you can see, a quick analysis gave me all the answers the Honeynet Police Department was looking for. This analysis took less than 30 minutes.
It is important to note that this quick analysis was possible because the three file were stored on disk in consecutive clusters, and because there was no fragmentation. If any of the files had been fragmented it would have been very hard to recover them the way I did.
Finally, it is worth mentioning that by recovering the files with the brute-force method I employed, the files are a little bit bigger than their original sizes. The reason is that in the last cluster of a file there are some unused bytes, unless the file size is multiple of 512 (because the size of a cluster is 512 bytes.) So, we the brute-force method we are saving to disk these unused bytes. However, in most situations this is not a problem since applications read the file and just ignore the extra bytes because they are not part of the file format. If applications did some kind of integrity check (like a checksum of the whole file, including the unused bytes) then we would be in trouble.
As I mentioned in Question 4, the disk image was manipulated so no useful information could be obtained by mounting the disk and looking at its contents:
[email protected]:~$ sudo mount -o loop image.orig ~/mnt Password: [email protected]:~$ ls -l ~/mnt total 17 -rwxr-xr-x 1 root root 15585 2002-09-11 08:30 cover page.jpgc -rwxr-xr-x 1 root root 1000 2002-05-24 08:20 schedu~1.exe
The above commands do the following: 1) mount the disk image as a VFAT Linux file system on ~/mnt and using the loop driver, and 2) list the contents of the mounted disk. If we look at the contents of the first file ("cover page.jpgc ") all we will see are zeros. Also, running the "file" command on the same file tells us that schedu~1.exe is a ZIP file, but as soon as we try to unzip it we find out that the file appears to be corrupted.
My idea of a "complete analysis" was to bring the disk image to a state where its contents could be seeing without binary editors or any tricks. For this, based on knowledge of the disk layout, on the information obtained during the quick analysis, on knowledge about the structure of the root directory and of the FAT, I was able to write a C program that, when run on the original disk image, fixed it and made it usable.
Let's run the program on the image, re-mount the image, and see how everything looks know:
[email protected]:~$ gcc -o fix-image fix-image.c [email protected]:~$ ./fix-image image Comparing FATs... OK [email protected]:~$ sudo mount -o loop image ~/mnt [email protected]:~$ ls -l ~/mnt total 38 -rwxr-xr-x 1 root root 15585 2002-09-11 08:30 cover page.jpg -rwxr-xr-x 1 root root 20480 2002-04-15 14:42 Jimmy Jungle.doc -rwxr-xr-x 1 root root 2560 2002-05-24 08:20 Scheduled Visits.exe
Now we can see what really was in the floppy disk image. These files can be viewed with programs that can handle their respective file types (JPEGs, MS Word and Excel documents.)
The C program is very simple: it just reconstructs the File Allocation Table (taking advantage of the fact that data is stored on disk in consecutive clusters) and the root directory.
Bonus Question. What Microsoft program was used to create the Cover Page file. What is your proof (Proof is the key to getting this question right, not just making a guess).
I am not too sure about the application that was used to create the JPEG file "cover page.jpg". The JPEG file does not have a comment, which would help to identify the application that created the file. My guess is that it was Microsoft PowerPoint. PowerPoint is normally used to create presentations (.ppt files) but it can export one or several slides to JPEG images. My proof (although very weak) would be that the definition of the quantization tables in the JPEG file matches the quantization tables in a document that I generated with PowerPoint and saved as a JPEG image.
The three files recovered from the floppy disk image seized by the Honeynet Police Department from Joe Jacobs' house contain information that might lead to the indictment of Mr. Jacobs and to his possible supplier of marijuana.
The methods used to conceal the information in the floppy disk were not sophisticated. The most hi-tech method employed was the encryption of one file. Fortunately, the password to decrypt the file was left in the same floppy, and in clear text. If the owner of the floppy disk had used public key encryption, like that provided by the GNU Privacy Guard, and a good passphrase had been used, and the passphrase had not been written down anywhere, it would have been practically impossible to recover the encrypted file.
This section contains links to all the files referenced in this paper as well as a short summary of the purpose of the file.
I do not have any MS-DOS books or reference material anymore, so to be able to do the forensic work on the MS-DOS floppy disk image I relied on material that I was able to find on-line. In particular, my main source of information was a web site for a computer science class taught at BYU:
I found this page by searching on Google for "FAT" or something like that. The web site contains a link to a Microsoft White Paper on FATs. The paper was very useful for working with long file names. I tried to find the original white paper on Microsoft's web site but could not find anything.
Figure 4 was taken from the page at BYU mentioned above. I hope it is not copyrighted.
/usr/src/linux/include/linux/msdos_fs.h was also very useful. If you don't have any MS-DOS references at hand, and need to find out details about the layout of MS-DOS disks (or any file system for that matter), looking at header files from the Linux kernel is very useful.
The Filesystems HOWTO is another on-line reference that I used. I probably used it very little since the information there is also covered by the BYU web site I mentioned above.
I used http://www.wotsit.org/ to find the definition of the JPEG file format.