Analyzing Suspicious PDF Files With Peepdf

Attackers continue to use malicious PDF files as part of targeted attacks and mass-scale client-side exploitation. Peepdf, a new tool from Jose Miguel Esparza, is an excellent addition to the PDF analysis toolkit for examining and decoding suspicious PDFs.

For this introductory walk-through, I will take a quick look at the malicious PDF file that I obtained from Contagio Malware Dump. If you’d like to experiment with this file in an isolated laboratory environment, you’re welcome to download the malicious PDF from my server; the password to the zip file is the word “infected”.

Peepdf is written in Python. It doesn’t have a graphical user interface; however, people used to command-line interfaces will feel very comfortable using the tool. This applies especially to its interactive mode, which drops you into the tool’s shell that allows you to navigate through the PDF file’s structure and explore its contents.

Examining a PDF File for Suspicious Characteristics

After installing Peepdf (instructions below), you can simply scan the PDF file by using the “peepdf file.pdf” command to obtain information about the file. When you’re in the tool’s interactive shell, you can view these details using the “info” command.

When asked to show you “info”, Peepdf will point out suspicious objects that are often used for attacks. In the example below, the tool highlighted AcroForm, OpenAction, Names, JS and JavaScript because these PDF elements are often misused. It also pointed out that object 13 seems to contain JavaScript, which is a common component of PDF exploits.

Examining Suspicious Object Contents in the PDF File

Use the “-i” parameter to Peepdf to enter its interactive mode (“peepdf -i file.pdf”). Once in the interactive shell, you can use the “object” command to examine contents of the desired object. For instance, we saw earlier that object 13 contains JavaScript. Typing “object 13” will show the object’s contents, including the embedded JavaScript. Peepdf will automatically decode the contents of the stream that includes JavaScript using the appropriate filters.

In our example, the variable “large_hahacode” seems to include shellcode that the analyst would want to extract and examine to understand its capabilities.

Peepdf includes additional commands to help you analyze contents of encoded JavaScript variables and execute JavaScript commands. This might help with deobfuscation, though I haven’t experimented with Peepdf enough yet to explore these capabilities of the tool. Peepdf does provide a convenient way of calling the “sctest” command from its interactive shell; this can help you emulate the execution of shellcode.

To learn more about Peepdf’s capabilities to analyze contents of malicious PDF files, see the tool’s webpage for usage highlights and its documentation for a full listing of its commands.

Installing Peepdf on Linux

I use REMnux for malware analysis tasks that can be performed on Unix, and hope to include Peepdf in the next release of this distro. Installing Peepdf on REMnux was straightforward, and I doubt you’ll run into challenges on other Linux platforms. Once logged into REMnux, assuming that the host is connected to the Internet, type:

sudo apt-get install python-pyrex
svn checkout python-spidermonkey
cd python-spidermonkey
python build
sudo python install
sudo ldconfig
cd .. && rm -rf python-spidermonkey
mkdir peepdf && cd peepdf
unzip && rm
ln -s /usr/local/bin/sctest .

If you like analyzing malicious programs, take a look at the Reverse-Engineering Malware course I teach at SANS. If you’re just getting to know malware, you might also like my Combating Malware course.


Lenny Zeltser


About the Author

Lenny Zeltser is a seasoned business and technology leader with extensive information security experience. He designs creative anti-malware solutions as VP of Products at Minerva Labs. He also trains incident response and digital forensics professionals at SANS Institute. Lenny frequently speaks at industry events, writes articles and has co-authored books. He has earned the prestigious GIAC Security Expert designation, has an MBA from MIT Sloan and a Computer Science degree from the University of Pennsylvania.

Learn more