I hoped to be writing to you about how I found a great chi-square technique to identify trojaned PDF's (we've certainly seen our share - 8.1, 8.1.1, and now 8.3/9.0...). Sadly, it's not so. I couldn't even get as far as rejecting my null hypothesis since component bytes, as random variables, are - no surprise - not normally distributed and therefore chi-square isn't really applicable. Should've seen that one coming a mile away. Stand by, we'll keep trying to find some sort of classification technique to identify "interesting" PDF's for manual inspection with a low false-positive rate irrespective of exploit.
In the meantime, I threw together a few neat perl scripts I figured I'd share here that may be of broader general interest. I'm also going to include a few unrelated gems that have proven helpful for me over the years. I know hating on Perl scripts is a common past-time for some with nothing better to do, so I'll issue this disclaimer: these scripts have been accurate enough, fast, elegant, and readable for my purposes. As with anything Perl, there are 1,000 ways to accomplish anything. If you've found / know of a better way, good for you. There are certainly conceivable conditions in which these scripts may not function as designed - input is a terrifying thing sometimes. This is just food for thought, not an authoritative entry. That said, let's get to it.
Decompress streams in a PDF file (from STDIN)
Yes, I know pdftk will accomplish this just peachy, but I needed to decompress only - pdftk will normalize data. Possibly a limited application, but hey, maybe someone can use it :-)
#!/usr/bin/perl -w # # Blindly decompresses all PDF streams # use Compress::Zlib; binmode STDIN; binmode STDOUT; my $buf; while ((read STDIN, $block, 4096) != 0) { $buf .= $block; } $buf =~ s/stream\r\n(.*?)\r\nendstream/"stream\r\n".&inflate("$1")."\r\nendstream"/gems; print $buf; exit; sub inflate{ my $input = shift; my $output; my $status; $x = inflateInit() or die "Cannot create a inflation stream\n" ; ($output, $status) = $x->inflate($input) ; return $output; }
Create histogram of a file's constituent bytes (from STDIN)
#!/usr/bin/perl # # Output a histogram of byte frequencies my %histogram; my $byte; while (read STDIN, $byte, 1) { $histogram{unpack "C", $byte} += 1; } foreach (keys %histogram) { print "$_,$histogram{$_}\n"; }
Convert 6-byte integer into MAC address string
$ echo 256136729009152 |perl -e 'print unpack "H12", pack "Q", <>' 001cbf7af4e8
Note: 'Q' may only work on 64-bit systems; YMMV.
Convert 4-byte integer into dotted-decimal string
$ echo 3232253438 |perl -e 'print unpack "C4", pack "N", <>' 19216869254
Okay, not quite as readable as a 6-byte hex value without a separator, per above. Put some dots in there.
$ echo 3232253438 |perl -e 'print join ".", unpack "C4", pack "N", <>' 192.168.69.254
Backwards!
$ echo "192.168.69.254" |perl -e 'print unpack "N", pack "C4", split /\./, <>' 3232253438
Rip a space-separated well-formed URL out of a line of arbitrary text
$ echo 'a b c asdfaf3243$#[ ] http://asdf.com/somestuff fdaajjjf' \ |perl -pe 's/.*\W([a-z]+:\/\/[^\s\t]+)\W.*/$1/g' http://asdf.com/somestuff
Finally, as a postscript, I'd be remiss if I took full credit for all of the above. Special thanks to Eric, Zach, and Jason for various Perl insights that helped me build these.
Michael is a senior member of an incident response team for a large defense contractor. He has lectured for various audiences from IEEE to DC3, and teaches an introductory class on cryptography. His current work consists of security intelligence analysis and development of new tools and techniques for incident response. Michael holds a BS in computer engineering and has earned GCIA (#592) and GCFA (#711) gold certifications alongside various others.