I saw John McCash's artical on GMail forensics ... I was hooked and created pdgmail.
I've been messing around with the volatile toolkit for memory forensics and thought I'd try my hands at GMail memory forensics since, as John says, the GMail data isn't supposed to end up on disk anyways, maybe it's in the the browser memory?
Boy is it!
I used the pd dump tool from www.trapkit.de, available here, and tested against my meager GMail account, Windows XP, 2000, IE 6, IE 7 and Firefox 3. In all cases I was able to retrieve contact data, last login times and IP addresses, basic email headers and email bodies. Even if the browser was 'logged out' of GMail, they all still retained this data. Even for messages that were not opened, contacts that weren't used. Simply loading up the GMail UI loads all this data in the memory image.
How to use?
First step is to gather the browser memory. Here's a sample pd session where 6352 is the PID of a running IE instance:
E:\Program Files\tools>pd -p 6352 > 6352.dump pd, version 1.1 tk 2006, http://www.trapkit.de
Dump finished.
E:\Program Files\tools>dir
Directory of E:\Program Files\tools 09/27/2008 06:57 PM 117,908,254 6352.dump
Whoa big file! But this is forensics, we don't scare at large data sets. To use the pdgmail tool run this memory dump through strings -el to create a strings file, then either cat that file through pdgmail, or run pdgmail with the -f flag specifying your strings filename. example:
strings -el 6352.dump | pdgmail | less
Best mileage will be with Python 2.4.4 or 2.5 on Linux. I haven't tested it below those versions or on Windows.
It looks for these things:
- contacts
- last access records
- GMail account names
- message headers
- message bodies
Contacts show up as:
contact: name: "jeff bryner" email: "myemailaddress@gmail.com
Last Access records show most recent two logins and appear as:
last access: "14 hours ago" from IP "10.15.26.8", most recent access Tue Oct 14 10:57:53 2008 from IP "12.9.4.238"
Email messages are the messiest mostly because memory artifacts don't always conform to API standards, so picking them out is a best guess.
Using the most familiar email of all, headers show up as:
message header: ["ms","113b0d734737dec4","",4,"Gmail Team ","Gmail Team","mail-noreply@google.com",1184082900000,"Did you know that GMail was voted #2 in PC World's Top 100 products of 2005, ...",["^all","^i"]
Message bodies are parsed to turn the unicode into proper html:
Did you know that GMail was voted #2 in PC World's Top 100 products of 2005, right after Firefox? Why wouldn't you want to switch? Well, because it can be a pain to switch to a new email address. We know.
etc...
Nothing fancy, just some glorified regex and unicode handling dumped to stdout. It parses if possible, otherwise it just spits out a familiar line. Feel free to send me patches, tweak, rewrite, etc. Hope it helps someone!
Jeff Bryner , GCFA Gold #137, also holds the CISSP and GCIH certifications, occasionally teaches for SANS and performs forensics, intrusion analysis, and security architecture work on a daily basis.