homepage
Open menu
Go one level top
  • Train and Certify
    • Overview
    • Get Started in Cyber
    • Courses
    • GIAC Certifications
    • Training Roadmap
    • OnDemand
    • Live Training
    • Summits
    • Cyber Ranges
    • College Degrees & Certificates
    • Scholarship Academies
    • NICE Framework
    • Specials
  • Manage Your Team
    • Overview
    • Group Purchasing
    • Why Work with SANS
    • Build Your Team
    • Hire Cyber Talent
    • Team Development
    • Private Training
    • Security Awareness Training
    • Leadership Training
    • Industries
  • Resources
    • Overview
    • Internet Storm Center
    • White Papers
    • Webcasts
    • Tools
    • Newsletters
    • Blog
    • Podcasts
    • Posters & Cheat Sheets
    • Summit Presentations
    • Security Policy Project
  • Focus Areas
    • Cyber Defense
    • Cloud Security
    • Digital Forensics & Incident Response
    • Industrial Control Systems
    • Cyber Security Leadership
    • Offensive Operations
  • Get Involved
    • Overview
    • Join the Community
    • Work Study
    • Teach for SANS
    • CISO Network
    • Partnerships
    • Sponsorship Opportunities
  • About
    • About SANS
    • Our Founder
    • Instructors
    • Mission
    • Diversity
    • Awards
    • Contact
    • Frequently Asked Questions
    • Customer Reviews
    • Press
  • SANS Sites
    • GIAC Security Certifications
    • Internet Storm Center
    • SANS Technology Institute
    • Security Awareness Training
  • Search
  • Log In
  • Join
    • Account Dashboard
    • Log Out
  1. Home >
  2. Blog >
  3. Application Metadata of Nested Documents
John McCash

Application Metadata of Nested Documents

April 13, 2009

I was drawn to consider someting by a question on a certification practical exam I recently took. The problem had been presented as "find the specified text in the supplied disk image". However the text actually turned out to be viewable in a jpeg file which was nested inside a Word document. Once I'd found the text, the question was essentially answered, but then I started thinking about extraction options and the origins of that JPEG file.

I recalled a tool I'd recently discovered thanks to traffic on the GCFA mailing list, hachoir-subfile. The original email context was about using this tool to extract executable objects from PPS files, but it turns out that it works equally well to extract .jpg files. I had always assumed that when image files were incorporated into MS Office documents, they were somehow re-encoded, however this turns out not to be the case.

In fact, once a JPEG file has been extracted from its encapsulating Office document, you can then derive any metadata from it which was present in the original picture file, at least for the Office 2003 format (I tested it with Word 2003), and presumably earlier. It also works on PDFs. I verified this by creating a PDF with Adobe Acrobat, and then extracting the embedded JPEG using hachoir-subfile, finding all EXIF data intact. My testing with the new docx format of Office 2007 shows that much of the EXIF information in a JPEG file is apparently removed when the image is incorporated into the document. But in any case, you don't need hachoir to extract that format. Component files are extracted normally from docx files when you unzip then into a heirarchy of folders.

I like using exiftool (a precompiled windows binary is available for download) to extract metadata from JPEG files, but the same data, though somewhat less of it, can be extracted using another member of the hachoir family, hachoir-metadata.

The basic Hachoir application is actually a python library, so using it will require that Python be installed on your forensic workstation. I use cygwin (a UNIX application porting overlay for Windows) for various tasks already, so the Python package which is available for that works fine for me. However, various other Python ports are available from www.python.org if your needs differ from mine.

To get it working I simply downloaded the various hachoir packages;

  • hachoir-core-1.2.1.tar.gz
  • hachoir-parser-1.2.1.tar.gz
  • hachoir-regex-1.0.3.tar.gz
  • hachoir-subfile-0.5.3.tar.gz
  • hachoir-wx-0.3.tar.gz
  • hachoir-metadata-1.2.1.tar.gz
  • hachoir-urwid-1.1.tar.gz

I extracted each of these archives ("tar xvzf filename", or else use a Windows archive program such as 7zip which knows about tar and gzip), and then ran it's included setup.py python installation script (the command is just "python setup.py install"). Some of these packages are dependent on others, so you need to do them in the correct order. If you try to install one that needs another which hasn't yet been installed, it will complain, but you'll just have to go back and install the prerequisite before retrying.

Once all of the packages have been installed, extracting any included subfiles from a word document or PDF becomes a simple matter of typing "hachoir-subfile document_file_name target_folder_name". This will drop all extracted objects into files named file-####.ext (numbered, and with appropriate extensions) in the specified folder.

Now that the the subfiles are extracted, you can run exiftool or hachoir-metadata against them, and examine their metadata to your heart's content.

As always, please feel free to leave commentary if you liked this article or want to call me on the carpet for some inaccuracy.

Share:
TwitterLinkedInFacebook
Copy url Url was copied to clipboard
Subscribe to SANS Newsletters
Receive curated news, vulnerabilities, & security awareness tips
United States
Canada
United Kingdom
Spain
Belgium
Denmark
Norway
Netherlands
Australia
India
Japan
Singapore
Afghanistan
Aland Islands
Albania
Algeria
American Samoa
Andorra
Angola
Anguilla
Antarctica
Antigua and Barbuda
Argentina
Armenia
Aruba
Austria
Azerbaijan
Bahamas
Bahrain
Bangladesh
Barbados
Belarus
Belize
Benin
Bermuda
Bhutan
Bolivia
Bonaire, Sint Eustatius, and Saba
Bosnia And Herzegovina
Botswana
Bouvet Island
Brazil
British Indian Ocean Territory
Brunei Darussalam
Bulgaria
Burkina Faso
Burundi
Cambodia
Cameroon
Cape Verde
Cayman Islands
Central African Republic
Chad
Chile
China
Christmas Island
Cocos (Keeling) Islands
Colombia
Comoros
Cook Islands
Costa Rica
Croatia (Local Name: Hrvatska)
Curacao
Cyprus
Czech Republic
Democratic Republic of the Congo
Djibouti
Dominica
Dominican Republic
East Timor
East Timor
Ecuador
Egypt
El Salvador
Equatorial Guinea
Eritrea
Estonia
Ethiopia
Falkland Islands (Malvinas)
Faroe Islands
Fiji
Finland
France
French Guiana
French Polynesia
French Southern Territories
Gabon
Gambia
Georgia
Germany
Ghana
Gibraltar
Greece
Greenland
Grenada
Guadeloupe
Guam
Guatemala
Guernsey
Guinea
Guinea-Bissau
Guyana
Haiti
Heard And McDonald Islands
Honduras
Hong Kong
Hungary
Iceland
Indonesia
Iraq
Ireland
Isle of Man
Israel
Italy
Jamaica
Jersey
Jordan
Kazakhstan
Kenya
Kingdom of Saudi Arabia
Kiribati
Korea, Republic Of
Kosovo
Kuwait
Kyrgyzstan
Lao People's Democratic Republic
Latvia
Lebanon
Lesotho
Liberia
Liechtenstein
Lithuania
Luxembourg
Macau
Macedonia
Madagascar
Malawi
Malaysia
Maldives
Mali
Malta
Marshall Islands
Martinique
Mauritania
Mauritius
Mayotte
Mexico
Micronesia, Federated States Of
Moldova, Republic Of
Monaco
Mongolia
Montenegro
Montserrat
Morocco
Mozambique
Myanmar
Namibia
Nauru
Nepal
Netherlands Antilles
New Caledonia
New Zealand
Nicaragua
Niger
Nigeria
Niue
Norfolk Island
Northern Mariana Islands
Oman
Pakistan
Palau
Palestine
Panama
Papua New Guinea
Paraguay
Peru
Philippines
Pitcairn
Poland
Portugal
Puerto Rico
Qatar
Reunion
Romania
Russian Federation
Rwanda
Saint Bartholemy
Saint Kitts And Nevis
Saint Lucia
Saint Martin
Saint Vincent And The Grenadines
Samoa
San Marino
Sao Tome And Principe
Senegal
Serbia
Seychelles
Sierra Leone
Sint Maarten
Slovakia (Slovak Republic)
Slovenia
Solomon Islands
South Africa
South Georgia and the South Sandwich Islands
South Sudan
Sri Lanka
St. Helena
St. Pierre And Miquelon
Suriname
Svalbard And Jan Mayen Islands
Swaziland
Sweden
Switzerland
Taiwan
Tajikistan
Tanzania
Thailand
Togo
Tokelau
Tonga
Trinidad And Tobago
Tunisia
Turkey
Turkmenistan
Turks And Caicos Islands
Tuvalu
Uganda
Ukraine
United Arab Emirates
United States Minor Outlying Islands
Uruguay
Uzbekistan
Vanuatu
Vatican City
Venezuela
Vietnam
Virgin Islands (British)
Virgin Islands (U.S.)
Wallis And Futuna Islands
Western Sahara
Yemen
Yugoslavia
Zambia
Zimbabwe

Tags:
  • Digital Forensics and Incident Response

Related Content

Blog
Untitled_design-43.png
Digital Forensics and Incident Response, Cybersecurity and IT Essentials, Industrial Control Systems Security, Purple Team, Open-Source Intelligence (OSINT), Penetration Testing and Ethical Hacking, Cyber Defense, Cloud Security, Security Management, Legal, and Audit
December 8, 2021
Good News: SANS Virtual Summits Will Remain FREE for the Community in 2022
They’re virtual. They’re global. They’re free.
Emily Blades
read more
Blog
Digital Forensics and Incident Response
June 4, 2010
WMIC for incident response
Earlier this week, I posted about using psexec during incident response. I mentioned at the end of that post that I've been using WMIC in place of psexec and that I'd have more on that later. This post, is a follow up to the psexec post. WMIC Prompted by the excellent work of Ed Skoudis and his...
370x370_Mike-Pilkington.jpg
Mike Pilkington
read more
Blog
Digital Forensics and Incident Response
February 1, 2010
It's the little things (Part One)
For forensic analysts working in Windows environments, .lnk shortcut files and the thumbprint caches are valuable sources for details about missing data. Individuals wanting to hide their activities may flush their browser cache, Temp files, use, and even wipe the drive free space. However, they...
SANS_DFIR-370x370.png
SANS DFIR
read more
  • Register to Learn
  • Courses
  • Certifications
  • Degree Programs
  • Cyber Ranges
  • Job Tools
  • Security Policy Project
  • Posters & Cheat Sheets
  • White Papers
  • Focus Areas
  • Cyber Defense
  • Cloud Security
  • Cyber Security Leadership
  • Digital Forensics
  • Industrial Control Systems
  • Offensive Operations
Subscribe to SANS Newsletters
Receive curated news, vulnerabilities, & security awareness tips
United States
Canada
United Kingdom
Spain
Belgium
Denmark
Norway
Netherlands
Australia
India
Japan
Singapore
Afghanistan
Aland Islands
Albania
Algeria
American Samoa
Andorra
Angola
Anguilla
Antarctica
Antigua and Barbuda
Argentina
Armenia
Aruba
Austria
Azerbaijan
Bahamas
Bahrain
Bangladesh
Barbados
Belarus
Belize
Benin
Bermuda
Bhutan
Bolivia
Bonaire, Sint Eustatius, and Saba
Bosnia And Herzegovina
Botswana
Bouvet Island
Brazil
British Indian Ocean Territory
Brunei Darussalam
Bulgaria
Burkina Faso
Burundi
Cambodia
Cameroon
Cape Verde
Cayman Islands
Central African Republic
Chad
Chile
China
Christmas Island
Cocos (Keeling) Islands
Colombia
Comoros
Cook Islands
Costa Rica
Croatia (Local Name: Hrvatska)
Curacao
Cyprus
Czech Republic
Democratic Republic of the Congo
Djibouti
Dominica
Dominican Republic
East Timor
East Timor
Ecuador
Egypt
El Salvador
Equatorial Guinea
Eritrea
Estonia
Ethiopia
Falkland Islands (Malvinas)
Faroe Islands
Fiji
Finland
France
French Guiana
French Polynesia
French Southern Territories
Gabon
Gambia
Georgia
Germany
Ghana
Gibraltar
Greece
Greenland
Grenada
Guadeloupe
Guam
Guatemala
Guernsey
Guinea
Guinea-Bissau
Guyana
Haiti
Heard And McDonald Islands
Honduras
Hong Kong
Hungary
Iceland
Indonesia
Iraq
Ireland
Isle of Man
Israel
Italy
Jamaica
Jersey
Jordan
Kazakhstan
Kenya
Kingdom of Saudi Arabia
Kiribati
Korea, Republic Of
Kosovo
Kuwait
Kyrgyzstan
Lao People's Democratic Republic
Latvia
Lebanon
Lesotho
Liberia
Liechtenstein
Lithuania
Luxembourg
Macau
Macedonia
Madagascar
Malawi
Malaysia
Maldives
Mali
Malta
Marshall Islands
Martinique
Mauritania
Mauritius
Mayotte
Mexico
Micronesia, Federated States Of
Moldova, Republic Of
Monaco
Mongolia
Montenegro
Montserrat
Morocco
Mozambique
Myanmar
Namibia
Nauru
Nepal
Netherlands Antilles
New Caledonia
New Zealand
Nicaragua
Niger
Nigeria
Niue
Norfolk Island
Northern Mariana Islands
Oman
Pakistan
Palau
Palestine
Panama
Papua New Guinea
Paraguay
Peru
Philippines
Pitcairn
Poland
Portugal
Puerto Rico
Qatar
Reunion
Romania
Russian Federation
Rwanda
Saint Bartholemy
Saint Kitts And Nevis
Saint Lucia
Saint Martin
Saint Vincent And The Grenadines
Samoa
San Marino
Sao Tome And Principe
Senegal
Serbia
Seychelles
Sierra Leone
Sint Maarten
Slovakia (Slovak Republic)
Slovenia
Solomon Islands
South Africa
South Georgia and the South Sandwich Islands
South Sudan
Sri Lanka
St. Helena
St. Pierre And Miquelon
Suriname
Svalbard And Jan Mayen Islands
Swaziland
Sweden
Switzerland
Taiwan
Tajikistan
Tanzania
Thailand
Togo
Tokelau
Tonga
Trinidad And Tobago
Tunisia
Turkey
Turkmenistan
Turks And Caicos Islands
Tuvalu
Uganda
Ukraine
United Arab Emirates
United States Minor Outlying Islands
Uruguay
Uzbekistan
Vanuatu
Vatican City
Venezuela
Vietnam
Virgin Islands (British)
Virgin Islands (U.S.)
Wallis And Futuna Islands
Western Sahara
Yemen
Yugoslavia
Zambia
Zimbabwe
  • © 2022 SANS™ Institute
  • Privacy Policy
  • Contact
  • Careers
  • Twitter
  • Facebook
  • Youtube
  • LinkedIn