
This cheat sheet outlines tips and tools for reverse-engineering malicious documents, such as Microsoft Office (DOC, XLS, PPT) and Adobe Acrobat (PDF) files.
Structured Storage (OLE SS) defines a file system inside the binary Microsoft Office file.
Data can be “storage” (folder) and “stream” (file).
Excel stores data inside the “workbook” stream.
PowerPoint stores data inside the “PowerPoint Document” stream.
Word stores data inside various streams.
OfficeMalScanner locates shellcode and VBA macros from MS Office (DOC, XLS, and PPT) files.
DisView disassembles bytes at a given offset of an MS Office file. (Part of OfficeMalScanner)
MalHost-Setup extracts shellcode from a given offset in an MS Office file and embeds it an EXE file for further analysis. (Part of OfficeMalScanner)
Offvis shows raw contents and structure of an MS Office file, and identifies some common exploits.
Office Binary Translator converts DOC, PPT, and XLS files into Open XML files (includes BiffView tool).
OfficeCat scans MS Office files for embedded exploits that target several known vulnerabilities.
FileHex (not free) and FileInsight hex editors can parse and edit OLE structures.
| OfficeMalScanner file.doc scan brute | Locate shellcode, OLE data, PE files in file.doc |
| OfficeMalScanner file.doc info | Locate VB macro code in file.doc (no XML files) |
| OfficeMalScanner file.docx inflate | Decompress file.docx to locate VB code (XML files) |
| DisView file.doc 0x4500 | Disassemble shellcode at 0x4500 in file.doc |
| MalHost-Setup file.doc out.exe 0x4500 | Extract shellcode from file.doc’s offset 0x4500 and create it as out.exe |
A PDF File is comprised of header, objects, cross-reference table (to locate objects), and trailer.
“/OpenAction” and “/AA” (Additional Action) specifies the script or action to run automatically.
“/Names”, “/AcroForm”, “/Action” can also specify and launch scripts or actions.
“/JavaScript” specifies JavaScript to run.
“/GoTo*” changes the view to a specified destination within the PDF or in another PDF file.
“/Launch” launches a program or opens a document.
“/URI” accesses a resource by its URL.
“/SubmitForm” and “/GoToR” can send data to URL.
“/RichMedia” can be used to embed Flash in PDF.
“/ObjStm” can hide objects inside an Object Stream.
Be mindful of obfuscation with hex codes, such as “/JavaScript” vs. “/J#61vaScript”. (See examples)
PDFiD identifies PDFs that contain strings associated with scripts and actions. (Part of PDF Tools)
PDF-parser identifies key elements of the PDF file without rendering it (Part of PDF Tools)
Origami Walker examines the structure of PDF files.
Origami pdfscan identifies PDFs that contain strings associated with scripts and actions.
Origami extractjs and Jsunpack-n’s pdf.py extract JavaScript from PDF files.
Sumatra PDF and MuPDF are lightweight and free viewers that may be used in place of Adobe Acrobat.
Malzilla can extract and decompress zlib streams from PDFs, and can help deobfuscate JavaScript.
Jsunpack-n can extract and decode JavaScript from pcap network captures, and can decode PDF files.
CWSandbox, Wepawet, and Jsunpack can analyze some aspects of malicious PDF files.
| pdfid.py file.pdf | Locate script and action-related strings in file.pdf |
| pdf-parser.py file.pdf | Show file.pdf’s structure to identify suspect elements |
| pdfscan.rb file.pdf | Examine and display file.pdf’s structure (Usage) |
| extractjs.rb file.pdf | Extract JavaScript embedded in file.pdf |
| pdf.py file.pdf | Extract JavaScript embedded in file.pdf |
ExeFilter can filter scripts from Office and PDF files.
ViCheck.ca automatically examines malicious Office and PDF files.
VirusTotal can scan files with multiple anti-virus tools to identify some malicious documents.
Adobe Portable Document Format (PDF) Reference
Physical and Logical Structure of PDF Files
Methods for Understanding and Analyzing Targeted Attacks with Office Documents (video)
Analyzing MSOffice Malware with OfficeMalScanner (follow-up presentation)
PDF Security Analysis and Malware Threats
Malicious Origami in PDF (follow-up presentation)
OffVis 1.0 Beta: Office Visualization Tool article
Reverse-Engineering Malware cheat sheet
Special thanks for contributions and feedback to Pedro Bueno, Frank Boldewin, and Didier Stevens. If you have suggestions for improving this cheat sheet, please let me know.
This cheat sheet is distributed according to the Creative Commons v3 "Attribution" License. File version 1.9.
Take a look at my other security cheat sheets.
About the Author: Lenny Zeltser leads the security consulting practice at Savvis, where he focuses on designing and operating security programs for cloud-based IT infrastructure. Lenny's other area of specialization is malicious software; he teaches how to analyze and combat malware at SANS Institute. Lenny explores security topics at conferences, in books and in articles. He also volunteers as an incident handler at the Internet Storm Center. You can follow Lenny on Twitter to stay in touch.
Copyright © 1995-2010 Lenny Zeltser. All rights reserved. RSS Feed.
The information on this site does not necessarily represent positions or opinions of my employer.