SANS Digital Forensics and Incident Response Blog | Detecting Shellcode Hidden in Malicious Files

A challenge both reverse engineers and automated sandboxes have in common is identifying whether a particular file is malicious or not. This is especially true if the malicious aspects are obfuscated and only triggered under very specific circumstances.

There are a number of techniques available to try and identify embedded shellcode, for example searching for patterns (NOP sleds, GetEIP etc), however as attackers update their methods to overcome our protections it becomes more difficult to find the code without having the exact version of the vulnerable software targeted, and allowing the exploit to successfully execute.

In this post, I will discuss a new technique I have been experimenting with, which approaches this issue from a different perspective, forcing the execution of the exploit code, no matter what software you have installed. It is based on two core principles:

If you try and execute something that isn't code (e.g. a text string), the program will likely crash as the machine code interpretation of this data is unlikely to make much sense.
If you begin executing code from the start (i.e. wherever the instruction pointer would have been set during the exploitation phase), it will run to completion - no matter how obfuscated the instructions are.

So here's my theory: If we attempt to "execute" the contents of a malicious file (such as a pdf), byte by byte, catching the exceptions as the program continually crashes and then increasing the instruction pointer by one each time, we will eventually come across any malicious code contained therein which will be triggered, run to completion, and provide indicators of its malicious nature through behavioural analysis.

The experiment

In order to test this concept, I wrote a program which does the following:

Maps the requested file to memory (i.e. make a full copy of it in memory).
Set the instruction pointer to the first byte, and allow it to run.
It will probably crash! (The instructions won't make sense!!)
Catch the error, and use the error handler to increase the instruction pointer by one.
Try again, and again, and again...
If the file contains shellcode, you should eventually hit it, and it will run - hurrah!

Demonstrating the concept

Step 1 - Generating malicious document and starting metasploit reverse handler

We begin by generating a malicious pdf document containing the reverse_tcp metasploit payload, and starting the handler to await incoming connections. The attacker is now waiting for the victim to open the file with a vulnerable pdf reader, at which point it will connect back to the attackers machine.

Step 2 - Dealing with the malicious pdf

Now, let us imagine we are conducting an analysis on this document (either manually, or using an automated sandbox) - the issue we are going to have in this case, is that we are unlikely to have the vulnerable version of the software installed, the exploit won't work, and we will be none the wiser that it exists! This isn't to say that the intended victim doesn't have the vulnerable version installed.

Let us try running the PDF through our proof-of-concept shellcode hunter...

Step 3 - Bingo - shellcode has been located and triggered

As we can see below, the shellcode in the document has been triggered and established a connection back to the metasploit listener! If we were conducting a behavioural analysis, we would be able to identify the suspicious activity and take appropriate action.

Video demo

Check out the video demo if you'd like to see this in action live:

Code sample

If you're interested in testing the concept, or integrating it into your software (anyone fancy writing a cuckoo module?) - The code I used was pretty simple, and looked like this:

It could definitely be a lot more advanced than the proof of concept I wrote for this demo, for example, if the shellcode started with a JMP $-2 instruction it would trap this code by causing an infinite loop. This could be potentially overcome using multi-threading to continue the search after the first code block has been found.

You may have to play with your compiler settings to get this to work. I set Visual Studio to compile with the 'Debug' configuration and switched off some of the protections. If you need some help getting it working, send me a tweet.

I will be presenting this and a few other concepts during a session at SANS London in a couple of weeks, if you're attending the conference - it would be great to have you along!

Let me know what you think!

Follow me on Twitter: @CyberKramer