This cheat sheet outlines tips and tools for reverse-engineering malicious documents, such as Microsoft Office (DOC, XLS, PPT) and Adobe Acrobat (PDF) files.
Structured Storage (OLE SS) defines a file system inside the binary Microsoft Office file.
Data can be “storage” (folder) and “stream” (file).
Excel stores data inside the “workbook” stream.
PowerPoint stores data inside the “PowerPoint Document” stream.
Word stores data inside various streams.
OfficeMalScanner locates shellcode and VBA macros from MS Office (DOC, XLS, and PPT) files.
MalHost-Setup extracts shellcode from a given offset in an MS Office file and embeds it an EXE file for further analysis. (Part of OfficeMalScanner)
Offvis shows raw contents and structure of an MS Office file, and identifies some common exploits.
Hachoir-urwid can navigate through the structure of binary Office files and view stream contents.
pyOLEScanner.py can examine and decode some aspects of malicious binary Office files.
|OfficeMalScanner file.doc scan brute||Locate shellcode, OLE data, PE files in file.doc|
|OfficeMalScanner file.doc info||Locate VB macro code in file.doc (no XML files)|
|OfficeMalScanner file.docx inflate||Decompress file.docx to locate VB code (XML files)|
|MalHost-Setup file.doc out.exe 0x4500||Extract shellcode from file.doc’s offset 0x4500 and create it as out.exe|
A PDF File is comprised of header, objects, cross-reference table (to locate objects), and trailer.
“/OpenAction” and “/AA” (Additional Action) specifies the script or action to run automatically.
“/Names”, “/AcroForm”, “/Action” can also specify and launch scripts or actions.
“/GoTo*” changes the view to a specified destination within the PDF or in another PDF file.
“/Launch” launches a program or opens a document.
“/URI” accesses a resource by its URL.
“/SubmitForm” and “/GoToR” can send data to URL.
“/RichMedia” can be used to embed Flash in PDF.
“/ObjStm” can hide objects inside an Object Stream.
PDFiD identifies PDFs that contain strings associated with scripts and actions.
PDF Stream Dumper combines many PDF analysis tools under a single graphical user interface.
PDF X-RAY Lite creates an HTML report containing decoded PDF file structure and contents.
SWF mastah extracts SWF objects from PDF files.
Pyew includes commands for examining and decoding structure and content of PDF files.
|pdfid.py file.pdf||Locate script and action-related strings in file.pdf|
|pdf-parser.py file.pdf||Show file.pdf’s structure to identify suspect elements|
|pdf-parser.py --object id file.pdf||Display contents of object id in file.pdf. Add “--filter --raw” to decode the object’s stream.|
|swf_mastah.py –f file.pdf
|Extract PDF objects from file.pdf into the out directory.|
ExeFilter can filter scripts from Office and PDF files.
This cheat sheet is distributed according to the Creative Commons v3 "Attribution" License. File version 2.
Take a look at my other security cheat sheets.
Authored by Lenny Zeltser. Lenny is a business and tech leader with extensive experience in information technology and security. His areas of expertise include incident response, cloud services and product management. Lenny focuses on safeguarding customers' IT operations at NCR Corporation. He also teaches digital forensics and anti-malware courses at SANS Institute. Lenny frequently speaks at conferences, writes articles and has co-authored books. He has earned the prestigious GIAC Security Expert designation, has an MBA from MIT Sloan and a Computer Science degree from the University of Pennsylvania. You can follow Lenny on Twitter, read his blog and circle him on Google+.
Copyright © 1995-2013 Lenny Zeltser. All rights reserved. RSS Feed.
The information on this site does not necessarily represent positions or opinions of my employer.