PDF Metadata Extraction with Python

PDF Metadata Extraction with Python (PDF, 5.31MB)Published: 05 Feb, 2019

Created by:

Christopher A. Plaisance

This paper explores techniques for programmatically extracting metadata from PDF files using Python. It begins by detailing the internal structure of PDF documents, focusing on the internal system of indirect references and objects within the PDF binary, the document information dictionary metadata type, and the XMP metadata type contained in the file's metadata streams. Next, the paper explores the most common means of accessing PDF metadata with Python, the high-level PyPDF and PyPDF2 libraries. This examination discovers deficiencies in the methodologies used by these modules, making them inappropriate for use in digital forensics investigations. An alternative low-level technique of carving the PDF binary directly with Python, using the re module from the standard library is described, and found to accurately and completely extract all of the pertinent metadata from the PDF file with a degree of completeness suitable for digital forensics use cases. These low-level techniques are built into a stand-alone open source Linux utility, pdf-metadata, which is discussed in the paper's final section.

Additional resources

SANS FOR500: Capability Access Manager Challenge Coin Challenge

WhitepaperDigital Forensics and Incident Response

28 Jan 2026

Digital Forensics and Incident Response in the Cloud: Addressing GCP Challenges

WhitepaperDigital Forensics and Incident Response

16 Jan 2026

Inside the Five Most Dangerous New Attack Techniques

WhitepaperCloud Security, Digital Forensics and Incident Response, Cybersecurity Leadership, Artificial Intelligence, Industrial Control Systems Security

8 Dec 2025
Heather Barnhart, Rob T. Lee, Joshua Wright, Tim Conway

Measuring Malware Obfuscation: Evaluating CNN- Based Detection for Real-World Resilience

WhitepaperDigital Forensics and Incident Response

19 Nov 2025

Scrutinizing A Web-Based LLM in Private Browsing Mode: An Analysis of Memory Artifacts and Privacy Implications

WhitepaperDigital Forensics and Incident Response

7 Nov 2025

Adversary-Aware IOC Retention: Analyzing Time-to-Live Patterns by Threat Actor Attribution

WhitepaperDigital Forensics and Incident Response

23 Oct 2025

Related courses

Slide 1 of 16
FOR589: Cybercrime Investigations
Intermediate
FOR589Digital Forensics and Incident Response
5 Days (Instructor-Led)
30 CPEs / 30 Hours (Self-Paced)
Labs: 20 Hands-On Labs
View course details Register
Slide 2 of 16
FOR585: Smartphone Forensic Analysis In-Depth
Major Updates
AI SKILLS
Essentials
FOR585Digital Forensics and Incident Response
GIAC Advanced Smartphone Forensics (GASF)
6 Days (Instructor-Led)
36 CPEs / 36 Hours (Self-Paced)
Labs: 22 Hands-On Labs
View course details Register
Slide 3 of 16
Test: DEV982
AI-FOCUSED
Advanced
DEV982Digital Forensics and Incident Response
GIAC Network Forensic Analyst (GNFA)
6 Days (Instructor-Led)
36 CPEs / 36 Hours
Labs: 20 Hands-On Labs
View course details Register
Slide 4 of 16
FOR608: Enterprise-Class Incident Response & Threat Hunting
updated
Intermediate
FOR608Digital Forensics and Incident Response
GIAC Enterprise Incident Responder (GEIR)
6 Days (Instructor-Led)
36 CPEs / 36 Hours (Self-Paced)
Labs: 20 Hands-On Labs
View course details Register
Slide 5 of 16
FOR518: Mac and iOS Forensic Analysis and Incident Response
updated
Intermediate
FOR518Digital Forensics and Incident Response
GIAC iOS and macOS Examiner (GIME)
6 Days (Instructor-Led)
36 CPEs / 36 Hours (Self-Paced)
Labs: 23 Hands-On Labs
View course details Register
Slide 6 of 16
FOR508: Advanced Incident Response, Threat Hunting, and Digital Forensics
updated
Intermediate
FOR508Digital Forensics and Incident Response
GIAC Certified Forensic Analyst (GCFA)
6 Days (Instructor-Led)
36 CPEs / 36 Hours (Self-Paced)
Labs: 35 Hands-On Labs
View course details Register
Slide 7 of 16
FOR610: Reverse-Engineering Malware: Malware Analysis Tools and Techniques
Advanced
FOR610Digital Forensics and Incident Response
GIAC Reverse Engineering Malware (GREM)
6 Days (Instructor-Led)
36 CPEs / 36 Hours (Self-Paced)
Labs: 48 Hands-On Labs
View course details Register
Slide 8 of 16
FOR578: Cyber Threat Intelligence
Intermediate
FOR578Digital Forensics and Incident Response
GIAC Cyber Threat Intelligence (GCTI)
6 Days (Instructor-Led)
36 CPEs / 36 Hours (Self-Paced)
Labs: 20 Hands-On Labs
View course details Register
Slide 9 of 16
FOR509: Enterprise Cloud Forensics and Incident Response
Major Updates
Intermediate
FOR509Digital Forensics and Incident Response
GIAC Cloud Forensics Responder (GCFR)
6 Days (Instructor-Led)
36 CPEs / 36 Hours (Self-Paced)
Labs: 23 Hands-On Labs
View course details Register
Slide 10 of 16
FOR528: Ransomware and Cyber Extortion
Intermediate
FOR528Digital Forensics and Incident Response
4 Days (Instructor-Led)
24 CPEs / 24 Hours (Self-Paced)
Labs: 13 Hands-On Labs
View course details Register
Slide 11 of 16
FOR577: LINUX Incident Response and Threat Hunting
Intermediate
FOR577Digital Forensics and Incident Response
GIAC Linux Incident Responder (GLIR)
6 Days (Instructor-Led)
36 CPEs / 36 Hours (Self-Paced)
Labs: 23 Hands-On Labs
View course details Register
Slide 12 of 16
FOR710: Reverse-Engineering Malware: Advanced Code Analysis
Advanced
FOR710Digital Forensics and Incident Response
36 CPEs / 36 Hours (Self-Paced)
Labs: 12 Hands-On Labs
View course details Register
Slide 13 of 16
FOR498: Digital Acquisition and Rapid Triage
Essentials
FOR498Digital Forensics and Incident Response
GIAC Battlefield Forensics and Acquisition (GBFA)
6 Days (Instructor-Led)
36 CPEs / 36 Hours (Self-Paced)
Labs: 20 Hands-On Labs
View course details Register
Slide 14 of 16
FOR563: Applied AI for Digital Forensics and Incident Response: Leveraging Local Large Language Models
new
AI-FOCUSED
Intermediate
FOR563Digital Forensics and Incident Response, Artificial Intelligence
1 Day (Instructor-Led)
6 CPEs / 6 Hours (Self-Paced)
Labs: 4 Hands-On Labs
View course details Register
Slide 15 of 16
FOR500: Windows Forensic Analysis
updated
Essentials
FOR500Digital Forensics and Incident Response
GIAC Certified Forensic Examiner (GCFE)
6 Days (Instructor-Led)
36 CPEs / 36 Hours (Self-Paced)
Labs: 22 Hands-On Labs
View course details Register
Slide 16 of 16
FOR572: Advanced Network Forensics: Threat Hunting, Analysis, and Incident Response
Advanced
FOR572Digital Forensics and Incident Response
GIAC Network Forensic Analyst (GNFA)
6 Days (Instructor-Led)
36 CPEs / 36 Hours (Self-Paced)
Labs: 20 Hands-On Labs
View course details Register