An incident responder or forensic investigator should be prepared to examine potentially-malicious document files, which may be located on the compromised system or discovered in email, web, or other network streams. After all, embedding malicious code into documents, such as Excel spreadsheets or Adobe Acrobat PDF files is quite effective at bypassing perimeter defenses. This note deals with one such scenario, focusing on how to extract Visual Basic (VB) macro code that may be embedded in malicious Microsoft Office files. I will discuss how to extract macros from both legacy binary Office files (.doc, .xls, .ppt), as well as modern XML-based Office formats that support macros (such as .docm, .xlsm, .pptm). As you'll see, OfficeMalScanner will be my tool of choice for getting the job done.
Malicious Use of Macro Code in Microsoft Office Document Files
Recent versions of Microsoft Office disable macros by default. However, with a bit of social engineering, an attacker can often trick the user into enabling macros. The attacker may then use embedded VB macro code to run arbitrary commands on the victim's system. For instance, Metasploit can generate its payload as a VB script, which can easily be embedded in an Office document; similarly, tools exist for converting arbitrary EXEs into VB script.
Legacy binary Microsoft Office formats don't differentiate between document files that may and may not include macros. In contrast, when the documents are saved using new XML-based file formats, introduced in Microsoft Office 2007, macros are only supported for file's extension that end with "m":
.docm, .xlsm, .pptm, .dotm, .xltm, .xlam, .potm, .ppam, .ppsm, .sldm
For this note, I created a sample set of "malicious" Excel document files: malware.xls (legacy binary format) and malware.xlsm (current XML-based format). The files include VB code that mimics an attacker's attempt to run commands on the victim's system. The code in this example pings localhost and launches notepad.exe when the victim opens the document and allows macros. Here's what it will look like from the victim's experience (the screen shot didn't capture the instance of Notepad, which would also be running):
I took this proof-of-concept code from the example published on the Invisible Denizen blog. You can download my sample files here; the password for the zip file is the word "infected."
Let's say you need to analyze a potentially-malicious Microsoft Office document files like these. The approach I'd like to demonstrate involves extracting VB macro code using MalOfficeScanner, a free tool by Frank Boldewin, which can deconstruct Microsoft Office files and extract embedded shellcode and VB code. I'll explore only the features of this tool related to VB.
Extracting VB Macro Code from Binary Microsoft Office Files
Let's start with the malware.xls example, you'll be more likely to encounter such binary files than XML-based ones. Legacy binary formats of Microsoft Office files (.doc, .xls, .ppt) are considered more risky than newer XML-based formats, because the programs parsing binary files tend to be more complex and, therefore, more prone to bugs.
Place the suspicious document file on the laboratory system running Microsoft Windows, where you placed MalOfficeScanner. Go to the command prompt. To scan the file (malware.xls) for the presence of VB macro code, type "OfficeMalScanner malware.xls info".
The tool will examine the file and, if it locates VB macros, extract the code into text files in the "MALWARE.XLS-Macros" folder. Since an Excel spreadsheet can have multiple sheets, you will see a file per sheet, as well as the file called "ThisWorkgroup" for global macros. In my example, the relevant VB code is in the "ThisWorkgroup" file:
Sub Run_Cmd(command, visibility, wait_on_execute) Dim WshShell As Variant Set WshShell = CreateObject("WScript.Shell") WshShell.Run "%COMSPEC% /c " & command, visibility, wait_on_execute End Sub Sub Run_Program(program, arguments, visibility, wait_on_execute) Dim WshShell As Variant Set WshShell = CreateObject("WScript.Shell") WshShell.Run program & " " & arguments & " ", visibility, wait_on_execute End Sub Sub Workbook_Open() Const VISIBLE = 1, INVISIBLE = 0 Const WAIT = True, NOWAIT = False Run_Cmd "ping 127.0.0.1", VISIBLE, WAIT Run_Program "notepad.exe", "", VISIBLE, NOWAIT End Sub
Extracting VB Macro Code from XML Microsoft Office Files
The "info" option of MalOfficeScanner only works with legacy binary Microsoft Office files. If you try to use it on "malware.xlsm", you'll get an error.
No problem. XML-formatted versions of Microsoft Office files, which typically have extensions such as .docx, .xlsx, and .pptx, are actually zip-compressed archives that contain several files. You can unpack the archive using tools such as unzip and ZipReader (e.g., "unzip malware.xlsm").
In this case, the "vbaProject.bin" file contains extracted VB macro code in a binary format.
Alternatively, you can unpack the archive using the "inflate" option of MalOfficeScanner, which will also identify the extracted files that contain VB. To do this, you would type "OfficeMalScanner malware.xlsm inflate".
Whether you obtained the binary VB file using manual unpacking or OfficeMalScanner's "inflate" feature, you can extract scripts from .bin files using the "info" feature: "OfficeMalScanner vbaProject.bin info".
You can now look at text files in the VBAPROJECT.BIN-Macros folder to examine extracted VB code. In this case, the code is identical to the one I showed earlier in the note.
There you have it. An approach to extracting VB macro code from malicious Microsoft Office files, whether they use legacy binary or recent XML-based formats. Do you have another approach, or any thoughts on this one? Tell us in comments below.
If you found this note useful, you may like the Analyzing Malicious Documents cheat sheet, which outlines these and other tips for examining Microsoft Office and PDF files.
Lenny Zeltser focuses on safeguarding customers' IT operations at NCR Corporation. He also teaches how to analyze malware at SANS Institute. Lenny is active on Twitter and writes a security blog.