PDF File Signature — Magic Bytes
Understand the PDF file signature (%PDF-) and how to identify PDF files by examining their hex header bytes. Includes version detection and structure overview.
Hex
25 50 44 46 2D
ASCII
%PDF-
Detailed Explanation
Every valid PDF file begins with a header that starts with the ASCII string %PDF- followed by a version number. In hexadecimal, the signature bytes are 25 50 44 46 2D, mapping directly to the five characters of this header prefix.
Byte-by-byte breakdown:
| Offset | Hex | ASCII | Purpose |
|---|---|---|---|
| 0 | 25 |
% | Comment indicator in PostScript/PDF syntax |
| 1 | 50 |
P | |
| 2 | 44 |
D | |
| 3 | 46 |
F | |
| 4 | 2D |
- | Separator before version number |
| 5-7 | varies | 1.x or 2.0 | PDF version (e.g., 31 2E 37 for "1.7") |
PDF version in the header:
Immediately after %PDF-, you will see the version number as ASCII text. Common versions include:
%PDF-1.4→25 50 44 46 2D 31 2E 34(Acrobat 5 era)%PDF-1.5→ supports object and cross-reference streams%PDF-1.7→ the most widely used modern version (ISO 32000-1)%PDF-2.0→ latest specification (ISO 32000-2)
The second line — binary comment:
Immediately after the header line, well-formed PDFs include a comment line containing at least four bytes with values above 127 (e.g., 25 E2 E3 CF D3). This tells file transfer programs that the file contains binary data and should not be treated as plain text. The 25 at the start makes it a PDF comment (lines beginning with % are comments).
PDF internal structure:
After the header, a PDF file consists of four sections:
- Body — contains objects (text, fonts, images, page descriptions)
- Cross-reference table — byte-offset index of every object
- Trailer — points to the root object and the cross-reference table
- EOF marker — the file ends with
%%EOF(25 25 45 4F 46)
Practical file identification:
To identify a PDF in a hex editor, check bytes 0-4 for 25 50 44 46 2D. Some PDFs have a small amount of garbage or whitespace before the header (this is technically non-conforming but tolerated by most readers), so scanning the first 1024 bytes for this pattern is a more robust approach.
Security implications:
PDF files can contain JavaScript, embedded files, and external links. Identifying that a file is actually a PDF (regardless of its extension) is important in security scanning. Some malware disguises executables with a .pdf extension, but a hex signature check immediately reveals the true file type.
Use Case
PDF magic byte detection is used in email security gateways to identify PDF attachments, in document management systems for file validation, and in forensic analysis to recover PDF documents from disk images.