This document contains my solution for the October 2002 Scan Of The Month Challenge.
-- Bob Mathews <bobmathews at>
  1. A Word document (Jimmy_Jungle.doc) was recovered from the seized floppy disk. According to this document, Joe Jacob's marijuana supplier is:
    Jimmy Jungle
    626 Jungle Ave Apt 2
    Jungle, NY 11111
    For convenience, the Word file has been converted to text format: Jimmy_Jungle.txt
  2. A JPEG image file (cover_page.jpg) was also recovered from the disk. The image appears to be the cover page from a newsletter for people involved in the marijuana trade. In the unused disk space immediately following the image, an interesting piece of information was discovered. It proved to be the password that unlocks an encrypted zip file ( found on the disk. Finding this password was a lucky chance, but it might have been possible to open the zip file without it. See below for more discussion on the feasibility of guessing the password.
  3. The encrypted zip file contained an Excel spreadsheet (Scheduled_Visits.xls) giving a schedule of visits to several high schools, evidently for the purpose of selling marijuana. Aside from Smith Hill High School, the other schools listed are: Again, this file is available in text format: Scheduled_Visits.txt
  4. Some effort has been made to hide the files on the floppy disk. Here are the specific steps that were taken:
    This file was deleted. Since its contents were not overwritten, they could be recovered from the disk.
    The name of this file was changed to "cover page.jpgc           ". The trailing spaces may have been intended to confuse someone investigating the contents of the disk. They are not normally visible, but cannot be left out. The file extension was changed from .jpg to .jpgc, possibly in an attempt to prevent the operating system from locating an appropriate application with which to open the file.

    Furthermore, the directory entry for this file was damaged: its data pointer was set to an unused portion of the disk. It seems unlikely that this was the result of accidental damage to the disk, since nothing else was changed. It is more likely to be a deliberate attempt to prevent the contents of the file from being found. Those contents were still present on the disk, though, so it was possible to recover them.
    The name of this file was changed to "Scheduled Visits.exe      ". Once again, there are trailing spaces after the name. The extension was changed from .zip to .exe, even though this is not a valid .exe file.

    This file's directory entry was also damaged, but in a different way. The length field was changed, making the file appear to be shorter than it actually is, thereby preventing the entire content of the file from being read. Since the actual data on the disk was not damaged, it was possible to recover this file.

    As mentioned earlier, the contents of this file were protected with the zip encryption feature. However, the file password was carelessly left on the disk where it could be found. This allowed the encrypted file to be opened easily.

  5. The investigation proceeded in four phases. In the first phase, no special tools were used to analyze the disk contents. This phase was unsuccessful, due to steps taken to obscure the files on the disk. Next, an ASCII dump of the disk image was examined, in search of data that could not be found in the first phase. This approach resulted in some progress, as pieces of missing data were located. The third phase used the dosfsck program and other tools to recover the missing data. Finally, a careful analysis of the filesystem structure validated the results of phases two and three. The steps followed are described in more detail below. The work was done on a computer running the Linux operating system.

Phase 1 - Naive Analysis

After downloading the file, the first step is to verify the MD5 checksum given on the web page, to ensure that we have an undamaged copy. I don't like to compare checksums by eye, because that seems very tedious and error prone. Instead, I cut and pasted the checksum from the webpage and used md5sum's --check option. The command is shown below. It's very important that there are exactly two spaces before ""!

$ md5sum --check
<paste>b676147f63923e1f428131d59b1d6a72</paste> OK

Good, that checked out. Next, I extracted the contents of the compressed zip archive file.

$ unzip
  inflating: image
$ ls -l image
-rw-r--r--    1 bob      users     1474560 Sep 18 09:50 image

The size of this file is right for an image of a floppy disk. I used the Unix "file" utility to check what was in the file.

$ file image
image: x86 boot sector, system MSDOS5.0, FAT (12 bit)

Evidently, this is an MSDOS FAT filesystem. (FAT stands for File Allocation Table, in case anyone's interested. Twelve bits is the usual FAT entry size used for small floppies. A hard disk would use 16 or 32 bits.) Since "file" isn't foolproof, I also took a look at the file contents myself. I didn't see any reason to think that there was a mistake.

$ hexdump -C image |less
00000000  eb 3c 90 4d 53 44 4f 53  35 2e 30 00 02 01 01 00  |.<.MSDOS5.0.....|
00000010  02 e0 00 40 0b f0 09 00  12 00 02 00 00 00 00 00  |[email protected]|
00000020  00 00 00 00 00 00 29 cf  cd b1 c4 4e 4f 20 4e 41  |......)....NO NA|
00000030  4d 45 20 20 20 20 46 41  54 31 32 20 20 20 33 c9  |ME    FAT12   3.|
00000040  8e d1 bc f0 7b 8e d9 b8  00 20 8e c0 fc bd 00 7c  |....{.... .....||
00000050  38 4e 24 7d 24 8b c1 99  e8 3c 01 72 1c 83 eb 3a  |8N$}$....<.r...:|

(The rest of the first page is gibberish, so I've omitted it here.)

Next, I decided to take a quick look at this filesystem to see what was there. One way to do that is to copy the image onto a floppy disk, using the command dd if=image of=/dev/fd0 bs=512. Instead, I used the "loopback" feature to mount the image file as if it were a real block I/O device like a disk. Neither of these methods show deleted files or other hidden information, but that is delayed until later.

As root, I did the following. The "ro" option instructs mount that the filesystem should be read-only, to prevent accidentally changing something during the investigation.

# mount -o ro,loop image /mnt
# ls -la /mnt
drwxr-xr-x    2 root     root         7168 Dec 31  1969 ./
drwxr-xr-x   21 root     root         4096 Oct 12 15:30 ../
-rwxr-xr-x    1 root     root        15585 Sep 11 08:30 cover\ page.jpgc\ \ \ \ \ \ \ \ \ \ \ *
-rwxr-xr-x    1 root     root         1000 May 24 08:20 schedu~1.exe*

My "ls" command is actually an alias that uses the -b option, which causes special characters to be escaped with a backslash. That's lucky, because otherwise I might not have noticed the trailing spaces in the first filename.

The second filename contains "~1", suggesting that this is actually the short version of a long filename. The file's long name entry may have been damaged somehow, or the file may have been processed by a piece of software that doesn't understand long filenames.

I grabbed a copy of the files from the disk, then unmounted it.

# mkdir files
# cp /mnt/* files
# umount /mnt

Once again, I used the "file" utility to see what these files are.

$ file files/*
files/cover page.jpgc           : PC formatted floppy with no filesystem
files/schedu~1.exe:               Zip archive data, at least v2.0 to extract

The description "PC formatted floppy" didn't make sense to me, so I took a look at the contents of the file.

$ hexdump -C "files/cover page.jpgc           " |less
00000000  f6 f6 f6 f6 f6 f6 f6 f6  f6 f6 f6 f6 f6 f6 f6 f6  |................|
00000200  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00003ce0  00                                                |.|

The * means that several lines have been omitted because they are just the same as the previous one. There is nothing useful in this file. Someone may have wiped the contents out, or the disk may have been damaged somehow.

The second file is named with the .exe extension, but the "file" utility claims it's a Zip file, not a .exe program file.

$ hexdump -C files/schedu~1.exe |less
00000000  50 4b 03 04 14 00 01 00  08 00 98 5a b7 2c c7 55  |PK.........Z.,.U|
00000010  60 8d ea 08 00 00 00 42  00 00 14 00 00 00 53 63  |`......B......Sc|
00000020  68 65 64 75 6c 65 64 20  56 69 73 69 74 73 2e 78  |heduled Visits.x|
00000030  6c 73 94 c8 31 2a e3 49  0b db a8 10 c2 70 9d fc  |ls..1*.I.....p..|
00000040  10 03 31 a2 8e 48 e8 3c  4b 81 75 c9 8b 86 51 af  |..1..H.<K.u...Q.|

Sure enough, "file" is right. A real .exe file would start with the letters MZ. This file starts with PK, which incidentally are the initials of Phil Katz, the author of the PKZIP utility. It looks like the first file in the Zip archive is an Excel spreadsheet called "Scheduled Visits". I tried listing the contents of the archive to see what else is there.

$ unzip -v files/schedu~1.exe
Archive:  schedu~1.exe
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.

Bad luck, the file is damaged. I tried repairing it.

$ mkdir fix
$ cp files/schedu~1.exe fix
$ cd fix
$ zip -F schedu~1.exe
zip: reading Scheduled Visits.xls
zip warning: schedu~1.exe would be truncated.
Retry with option -qF to truncate, with -FF to attempt full recovery
$ zip -FF schedu~1.exe
zip: reading Scheduled Visits.xls
 compressed size 2282, actual size 950 for Scheduled Visits.xls
zip warning: schedu~1.exe has been truncated.

Unfortunately, it looks like I don't have the whole file. I attempted to get what I could out of it, though.

$ unzip -v schedu~1.exe
Archive:  schedu~1.exe
 Length   Method    Size  Ratio   Date   Time   CRC-32    Name
--------  ------  ------- -----   ----   ----   ------    ----
   16896  Defl:N      938  94%  05-23-02 11:20  8d6055c7  Scheduled Visits.xls
--------          -------  ---                            -------
   16896              938  94%                            1 file
$ unzip schedu~1.exe
Archive:  schedu~1.exe
[schedu~1.exe] Scheduled Visits.xls password:

The zip file is encrypted with a password. I took a few guesses, but all I got was this:

password incorrect--reenter:
   skipping: Scheduled Visits.xls    incorrect password

It is sometimes possible to guess the password of an encrypted file; see below for more information. However, this process is likely to take a long time. Instead, I decided to examine the disk image more closely, as described in the next section. This marks the end of the first phase of investigation. It did not meet with success.

Phase 2 - Dump disk image

Next, I examined the disk image directly, hoping to find files that were inaccessible before. I wanted to concentrate on looking for pieces of readable text in the image file. To that end, I used hexdump with a custom format that displays only printable ASCII characters, not hexadecimal notation. Here is the command I used:

$ hexdump -e '"%06_ax  " 64/1 "%_p" "\n"' image |less

Much of the output is gibberish, so I'll just skip over it in this presentation. Here's something that looks meaningful.

002600  .d.o.c...........................J.i.m.m.y.... .J.u.n.g.l...e...
002640  .IMMYJ~1DOC .h8F+-+-..Ou.,...P..Bg.c. . . .... . . . . . ... . .
002680  .c.o.v.e.r.... .p.a.g.e.....j.p.COVERP~1JPG .mMF+-+-...C+-...<..
0026c0  Bi.t.s...e....x.e. . . . ... . ..S.c.h.e.d....u.l.e.d. .V...i.s.
002700  SCHEDU~1EXE .SSF+-+-...B.,I.....................................
002740  ................................................................

This is the root directory, which contains the list of files on the disk. There are two files we already know about, "cover page.jpgc" and "schedu~1.exe". Note that there does appear to be a long name entry present for the latter. There's also a reference to a third file, "Jimmy Jungle.doc". The first letter of its short name has been replaced with a non-printable character, indicating that the file has been deleted. There may have been other deleted files on the disk as well, but if there were, the directory entries have been overwritten.

I returned to this directory listing later, to look more closely and make sure I had interpreted it correctly. Continuing with the image dump, I found this:

004c00  Jimmy Jungle.626 Jungle Ave Apt 2.Jungle, NY 11111..Jimmy:..Dude
004c40  , your pot must be the best . it made the cover of High Times Ma
004c80  gazine! Thanks for sending me the Cover Page. What do you put in
004cc0   your soil when you plant the marijuana seeds? At least I know y
004d00  our growing it and not some guy in Columbia.. .These kids, they
004d40  tell me marijuana isn.t addictive, but they don.t stop buying fr
004d80  om me. Man, I.m sure glad you told me about targeting the high s
004dc0  chool students. You must have some experience. It.s like a guara
004e00  nteed paycheck. Their parents give them money for lunch and they
004e40   spend it on my stuff. I.m an entrepreneur. Am I only one you se
004e80  ll to? Maybe I can become distributor of the year!..I emailed yo
004ec0  u the schedule that I am using. I think it helps me cover myself
004f00   and not be predictive.  Tell me what you think. To open it, use
004f40   the same password that you sent me before with that file. Talk
004f80  to you later...Thanks,..Joe ....................................

This looks very interesting, not to mention incriminating. After this is more gibberish mixed up with fragments of text, such as this example:

006880  ............,.......8.......D.......P.......X.......`.......h...
0068c0  ................Jimmy Jungle..o..........imm........0000. Ju....
006900  .....STC.........STC........Normal.u........0000tl.u........9.TC
006940  ........Microsoft Word [email protected]@[email protected]_....
006980  ................................................................

This appears to be part of the deleted file "Jimmy Jungle.doc" which was mentioned in the directory. Moving along again.

009200  ......JFIF.....`.`.....C................................... $.'
009240  ",#..(7),01444.'9=82<.342...C...........2!.!22222222222222222222
009280  222222222222222222222222222222..........."......................
0092c0  ......................................}........!1A..Qa."q.2....#

JFIF stands for "JPEG File Interchange Format." This looks like the beginning of a JPEG file, possibly the actual data from "cover page.jpgc". After this, there's a lot more nonsense characters (probably compressed JPEG data), and then something truly interesting.

00cec0  ...(...(...(...(...(...(...(....................................
00cf40  ................................................................

The letters "pw" suggested "password." Could this be the password for the zip file I had found? I tried it out right away.

$ unzip schedu~1.exe
Archive:  schedu~1.exe
[schedu~1.exe] Scheduled Visits.xls password:
  inflating: Scheduled Visits.xls
  error:  invalid compressed data to inflate
$ ls -l "Scheduled Visits.xls"
-rwxr-xr-x    1 bob      users           0 May 23 11:20 Scheduled\ Visits.xls*

Well, it didn't say that the password was wrong, but it evidently I didn't have enough of the file to recover anything. Too bad. Disappointed, I returned to the image file.

00d000  PK.........Z.,.U`......B......Scheduled Visits.xls..1*.I.....p..
00d040  ..1..H.<K.u...Q..*6.$..~uF..NVO....`6T....#....R......#-4..HT.b.
00d080  ^.?.Rr..f.J ....x.5kUM....a_...SA#.;.Qk.........I....;.2.VS....t
00d900  ...N(.}.H.-......#.vQ..!.!.qPK...........Z.,.U`......B..........
00d940  .. .......Scheduled Visits.xlsPK..........B.....................
00d980  ................................................................

This looked familiar -- it's the zip file I had been working on. However, notice that the data here is significantly longer than the 1000 bytes found before. This might be the complete contents of the file. There's nothing of interest after this in the image file, so this phase is now over. I now had some good leads to follow up on. In the next phase, I recovered the files that were found.

Phase 3 - Recovery

I used the Linux-based dosfsck program to try to recover the files. Another possibility would be to copy the image to a floppy (using the command given above) and use Windows-based recovery tools like ScanDisk. Here are the steps I followed, as root.

# cp image image.fix
# losetup /dev/loop0 image.fix
# dosfsck -u /jimmyj~1.doc -f -r /dev/loop0
dosfsck 2.8, 28 Feb 2001, FAT32, LFN
Undeleting JIMMYJ~1.DOC
Wrong checksum for long file name "Scheduled Visits.exe      ".
  (Short name SCHEDU~1.EXE may have changed without updating the long name)
1: Delete LFN
2: Leave it as it is.
3: Fix checksum (attaches to short name SCHEDU~1.EXE)
? 3
/cover page.jpgc
  Contains a free cluster (420). Assuming EOF.
/cover page.jpgc
  File size is 15585 bytes, cluster chain length is 0 bytes.
  Truncating file to 0 bytes.
/Scheduled Visits.exe
  File size is 1000 bytes, cluster chain length is > 1024 bytes.
  Truncating file to 1000 bytes.
Reclaimed 31 unused clusters (15872 bytes) in 1 chain.
Perform changes ? (y/n) y
/dev/loop0: 4 files, 73/2847 clusters
# losetup -d /dev/loop0
# mount -o ro,loop image.fix /mnt
# ls -la /mnt
total 48
drwxr-xr-x    2 root     root         7168 Dec 31  1969 ./
drwxr-xr-x   21 root     root         4096 Oct 12 15:30 ../
-rwxr-xr-x    1 root     root         1000 May 24 08:20 Scheduled\ Visits.exe\ \ \ \ \ \ *
-rwxr-xr-x    1 root     root            0 Sep 11 08:30 cover\ page.jpgc\ \ \ \ \ \ \ \ \ \ \ *
-rwxr-xr-x    1 root     root        15872 Dec 31  1979 fsck0000.rec*
-rwxr-xr-x    1 root     root        20480 Apr 15  2002 jimmyj~1.doc*
# mkdir fix2
# cp /mnt/* fix2
# umount /mnt
# file fix2/*
fix2/Scheduled Visits.exe      : Zip archive data, at least v2.0 to extract
fix2/cover page.jpgc           : empty
fix2/fsck0000.rec:               JPEG image data, JFIF standard 1.01, resolution (DPI), 96 x 96
fix2/jimmyj~1.doc:               Microsoft Office document data

The long file name for "schedu~1.exe" was recovered (note the trailing spaces), but not the complete contents of the file. An unattached chain of disk blocks was found, which appears to contain a JPEG image. That could be the original contents of "cover page.jpgc". Finally, the deleted file "jimmyj~1.doc" was recovered (there doesn't seem to be any way to make dosfsck restore the long name, "Jimmy Jungle.doc").

In order to check that the files were recovered correctly, I scanned through them with hexdump, but did not notice anything out of the ordinary. Next, I opened "fsck0000.rec" with an image viewer, and it looked fine. I renamed it to "cover_page.jpg", correcting the extension and removing the spaces, since spaces in filenames sometimes cause trouble.

I used Star Office to open "jimmyj~1.doc". It contained the same text I found before, but nicely formatted. I saved a copy as "Jimmy_Jungle.txt" and renamed the Word file to "Jimmy_Jungle.doc".

Next, made another attempt at recovering the zip file. This time, I simply copied a slice right out of the image file using this command:

$ dd if=image bs=512 skip=104 count=5

In order to verify that file matches the partial one I found before, I used the following command to get the first 1000 bytes from the new file and compare them with the old file.

$ head -1000c |diff - files/schedu~1.exe

No output means no differences. Good. Next I unzipped the contents, using the password discovered earlier.

$ unzip -v
 Length   Method    Size  Ratio   Date   Time   CRC-32    Name
--------  ------  ------- -----   ----   ----   ------    ----
   16896  Defl:N     2270  87%  05-23-02 11:20  8d6055c7  Scheduled Visits.xls
--------          -------  ---                            -------
   16896             2270  87%                            1 file
$ unzip
[] Scheduled Visits.xls password:
  inflating: Scheduled Visits.xls
$ ls -l "Scheduled Visits.xls"
-rw-rw-rw-    1 bob      users       16896 May 23 11:20 Scheduled\ Visits.xls
$ file "Scheduled Visits.xls"
Scheduled Visits.xls: Microsoft Office document data
$ mv "Scheduled Visits.xls" Scheduled_Visits.xls

It worked! Zip files include an internal CRC32 checksum to ensure file integrity. Since there was no error message about a CRC32 mismatch, it is extremely unlikely that this file was recovered incorrectly. I opened "Scheduled_Visits.xls" with Star Office and saved a copy in text format.

All of the files on the disk had now been recovered. However, to be more certain that everything was done correctly, I next went back to look more closely at the disk image.

Phase 4 - Filesystem Analysis

In this phase, I wanted to verify that my interpretation of the disk image was correct. Instead of scratching my head over a lot of hexadecimal numbers, I wrote a perl script to do the decoding. Since the MSDOS FAT filesystem is fairly simple, this was not a difficult task. For information on the filesystem structure, I turned to the Linux kernel source code. Most of the information came from the msdos_fs.h header file. The full output of the script is here: readimg.txt.

The filesystem consists of one or more reserved sectors (including the boot sector), followed by one or more copies of the File Allocation Table, then the root directory, and finally the data area. Since each of these has a variable size, the script starts out by reading the boot sector, which contains enough information to calculate the sizes. The root directory was calculated to start at offset 0x2600, which confirms my conclusion from phase two.

The root directory contains several entries for each file: a number of long name entries, and a short name entry which contains the rest of the information about the file. For some reason, the long name entries are stored in reverse order. The filename entries all appear to be as expected.

All of the cluster lists that were found consist of consecutive numbers. This means that the filesystem has not become fragmented through heavy use. Everything is neat and orderly, which makes recovery of files easier. The starting address and length of the chain for the "Scheduled Visits.exe" file agree with the values used to recover that file. The lists for "Jimmy Jungle.doc" and "cover page.jpgc" end with question marks. In the first case, that is because the chain of clusters was dismantled when the file was deleted. In the second case, the list starts at a cluster that is marked "unused", so there's no chain to follow. It is interesting that the number given for the first cluster, 420, is exactly ten times the starting number of the unconnected chain that was found. This seems to point to a deliberate modification to the starting address.

The file lengths are as expected. The file "cover page.jpgc" has a slightly shorter length than the file that was recovered by dosfdsk. This is because space is allocated to files one complete block at a time. If part of the last block is not needed by the file, the unused space left over is called "slack." The dosfsck program did not know about the slack space left by this file, since it was only looking at an unattached chain of blocks, not the directory entry it was once attached to. After truncating the file at the proper length (head -15585c cover_page.jpg >cover_page2.jpg), it no longer contains the password that was discovered. Thus, the password was in the slack space and not in the file itself. It might have been left there by another file which was deleted before "cover page.jpgc" was copied onto the floppy, or it might have been present in a buffer in the computer's memory when "cover page.jpgc" was written.

During this analysis, nothing was found to cast doubt on the previous conclusions. This brings the investigation to a successful completion.

Cracking the Password

It is somewhat unsatisfying that the success of this investigation appears to depend on the chance of finding the zip file password stored on the captured floppy disk. What if the password had not been found? There are a number of programs available which try to guess the password for an encrypted file. Some of them are commercial or shareware software, while others are free. However, since the guessing process is quite simple, I chose to write my own program instead. I found information on zip encryption in the PKZIP Application Note which is available from the PKWARE Web Site.

There is an extremely large number of possible passwords, so it is important to decide which ones are the most likely, and focus attention on those. If the two suspects, Joe Jacobs and Jimmy Jungle, were in a habit of sampling their wares, it might be expected that they would have a hard time remembering a complicated password. Some simple passwords that come to mind are single words, words with numbers added, and combinations of two words. (Indeed, the password they chose, "goodtimes", is composed of two English words, but there's no way to know that a priori.) I used the word list found in /usr/share/dict/words on my machine, which contains about 40,000 words. More specialized word lists have been created for this purpose. In this case, including drug terms in the list would seem to be a good idea.

The next problem is how to reject incorrect passwords. Encrypted zip files store a byte (or sometimes two) with a known value in the encryption header, as a quick way of detecting mistyped passwords. However, about one in 256 incorrect passwords can be expected to decrypt this known byte to the proper value just by chance. In a dictionary of 40,000 words, over 150 such false positives would be expected, and I'm going to test far more passwords than that. Therefore, it is necessary to have several more known bytes to distinguish the right password from the wrong ones. I created and compressed several small Excel spreadsheet files, and found that the first few bytes always came out the same. (More sophisticated techniques are possible, such as attempting to decompress the decrypted file with each guessed password. However, that would increase the complexity of the program greatly.)

I wrote a small C program to guess passwords for the recovered zip file. It is not heavily optimized, but is able to test around 300,000 passwords per second on my machine (a 350MHz Pentium-II). At that rate, it was able to try over one billion one-word, word-plus-number, and two-word passwords in under two hours. It located the correct password, and stumbled on only one incorrect password: "implyinspected". That mistake can easily be eliminated, because it does not decrypt the zip file successfully.

If the suspects had chosen a stronger password, this password guessing attack would be much more difficult or even impossible. Running times quickly reach into months or years as more possible passwords are considered. Faster computers could be used, or networks of computers working in parallel. However, the computing resources available to a local police department are limited, and they have many other cases that cannot be neglected. Faced by a sufficiently strong password, it may not be possible to recover the password using these techniques.

Other attacks against zip file contents are possible, such as the more advanced known plaintext attack used by the pkcrack program. However, it requires more known plaintext bytes than I have available in this case.