Security builder & leader

Analyzing Malicious Documents Cheat Sheet

Analyzing malicious documents involves examining files for anomalies, locating embedded code like macros or JavaScript, extracting and deobfuscating suspicious content, and emulating shellcode. Key tools include olevba for Office macros, pdfid for risky PDF keywords, and xlmdeobfuscator for Excel 4.0 macros.

Analyzing Malicious Documents Cheat Sheet - illustration

This cheat sheet outlines tips and tools for analyzing malicious documents, such as Microsoft Office, RTF, and PDF files. To print it, use the one-page PDF version; you can also edit the Word version to customize it for you own needs.

General Approach to Document Analysis

  1. Examine the document for anomalies, such as risky tags, scripts, and embedded artifacts.
  2. Locate embedded code, such as shellcode, macros, JavaScript, or other suspicious objects.
  3. Extract suspicious code or objects from the file.
  4. If relevant, deobfuscate and examine macros, JavaScript, or other embedded code.
  5. If relevant, emulate, disassemble and/or debug shellcode that you extracted from the document.
  6. Understand the next steps in the infection chain.

Microsoft Office Format Notes

Useful Microsoft Office File Analysis Commands

CommandDescription
zipdump.py file.pptxExamine contents of OOXML file file.pptx.
zipdump.py file.pptx -s 3 -dExtract file with index 3 from file.pptx to STDOUT.
olevba file.xlsmLocate and extract macros from file.xlsm.
oledump.py file.xls -iList all OLE2 streams present in file.xls.
oledump.py file.xls -s 3 -vExtract VBA source code from stream 3 in file.xls.
xmldump.py prettyFormat XML file supplied via STDIN for easier analysis.
oledump.py file.xls -p plugin_http_heuristicsFind obfuscated URLs in file.xls macros.
vmonkey file.docEmulate the execution of macros in file.doc to analyze them.
evilclippy -uu file.pptRemove the password prompt from macros in file.ppt.
msoffcrypto-tool infile.docm outfile.docm -pDecrypt outfile.docm using specified password to create outfile.docm.
pcodedmp file.docDisassemble VBA-stomped p-code macro from file.doc.
pcode2code file.docDecompile VBA-stomped p-code macro from file.doc.
rtfobj.py file.rtfExtract objects embedded into RTF file.rtf.
rtfdump.py file.rtfList groups and structure of RTF file file.rtf.
rtfdump.py file.rtf -OExamine objects in RTF file file.rtf.
rtfdump.py file.rtf -s 5 -H -dExtract hex contents from group in RTF file file.rtf.
xlmdeobfuscator —file file.xlsmDeobfuscate XLM (Excel 4) macros in file.xlsm.

Risky PDF Keywords

Useful PDF File Analysis Commands

CommandDescription
pdfid.py file.pdf -nDisplay risky keywords present in file file.pdf.
pdf-parser.py file.pdf -aShow stats about keywords. Add “-O” to include object streams.
pdf-parser.py file.pdf -o idDisplay contents of object id. Add “-d” to dump object’s stream.
pdf-parser.py file.pdf -r idDisplay objects that reference object id.
qpdf —password=pass —decrypt infile.pdf outfile.pdfDecrypt infile.pdf using password pass to create outfile.pdf.

Shellcode and Other Analysis Commands

CommandDescription
xorsearch -W -d 3 file.binLocate shellcode patterns inside the binary file file.bin.
scdbgc /f file.binEmulate execution of shellcode in file.bin. Use “/off” to specify offset.
runsc32 -f file.bin -nExecute shellcode in file.bin to observe behavior in an isolated lab.
base64dump.py file.txtList Base64-encoded strings present in file file.txt.
numbers-to-string.py fileConvert numbers that represent characters in file to a string.

Additional Document Analysis Tools

Post-Scriptum

Special thanks for feedback to Pedro Bueno and Didier Stevens. Creative Commons v3 “Attribution” License for this cheat sheet version 4.1.

About the Author

Lenny Zeltser is a cybersecurity executive with deep technical roots, product management experience, and a business mindset. As CISO at Axonius, he leads the security and IT program, focusing on trust and growth. He is also a Faculty Fellow at SANS Institute and the creator of REMnux, a popular Linux toolkit for malware analysis. Lenny shares his perspectives on security leadership and technology at zeltser.com.

Learn more →