I spoke with Jake Williams, an incident responder extraordinaire, who teaches SANS' FOR610: Reverse-Engineering Malware course. In the third and final part of the interview, Jake discussed his perspective on the various approaches to reverse-engineering malware, including behavioral, dynamic and static analysis as well as memory forensics. (For additional insights from Jake, see part 1 and part 2 of the interview.)
Can you talk a bit, how do you typically find yourself spending time reversing a sample, the extent to which you find yourself in a disassembler, in a debugger versus doing behavioral analysis?
Sure. That's an excellent question. If I may, can I throw in one other discipline of malware analysis?
Absolutely. What is it?
Memory forensics. I'm a huge believer in memory forensics and its use in malware RE. I know some of the reverse engineers out there are going to say, "That's forensics, not RE." But that's really not the case at all. If I get a chance to actually respond to the incident and find that machine on with the user logged in, then I'm immediately thinking "pay dirt." That gives me a chance to view the malware in its native environment, under the right conditions for infection. If I'm lucky enough to catch the machine in a state where it hasn't been rebooted since the initial infection then there's a wealth of information available to me that otherwise might be gone forever (particularly in the case of a drive-by attack when the private browsing is in use). Of course, I can replicate the malware's expected environment in the lab, but why expend the effort if I can get a memory image from the original compromised machine?
How do you normally incorporate memory forensics into your malware-reversing process?
Well, first I check for rootkits using some of the same techniques we teach in the FOR610 course. I trust the results much more than live response since we're asking questions about the OS rather than asking the OS directly. It's the difference between in band and out of band. In other words, the rootkit has no chance to lie to us. I also sometimes use the memory capture to dump unpacked malware from the memory image. If I have a sample that is a real pain to unpack with a debugger, then dumping it from memory is often the answer.
Speaking of a debugger, let's get back to my original question: how do you distribute your time between behavioral, dynamic and static analysis?
Well, a lot of great sandbox products out there will automate much of the behavioral analysis. While I think that behavioral analysis is important, I try to dedicate my time to tasks that haven't been automated. Once I have a piece of malware unpacked, I usually put it in IDA Pro first. If I have to rely on the debugger instead of IDA, then I'm forced to step through the code linearly. Unfortunately, this makes it easy to miss trigger-based behaviors.
What do you mean by "trigger-based"?
Basically, I'm looking for something that the malware only does under specific circumstances (in other words, a trigger). I usually find those trigger based behaviors much more easily by backing up from an interesting string or API call in IDA Pro than by using a top down approach in the debugger. I had a sample some time ago that was injecting code into outlook.exe to steal user's certificates and send email using the user's email server. If I ran this on my VM and didn't have Outlook running, I'd miss it in behavioral analysis and in the debugger. This was the most interesting capability the malware sample had. I'd really hate to miss that in my report. I know I can use a debugger to do a bottom up approach to malware RE, but it just feels more natural in IDA Pro.
One time I find myself going back to the debugger is to perform string deobfuscation. If I can avoid reverse engineering the algorithm and use the native deobfuscation code, I'll do it every time. Newer versions of IDA Pro have the new AppCall functionality, but that's usually much more difficult than using the debugger to execute the code. Even when I find myself in a debugger to work my way through a sample, I always have IDA Pro open in another window.
Why is that?
The disassembly engine in IDA is awesome, but the real magic behind it is in the FLIRT functionality. A LOT of library code gets compiled linked into the average binary. I don't want to reverse engineer any of it. The malware author may have intended to use to code, but they didn't create the code. IDA does an awesome job of identifying and labeling it, even if it is something as simple as "unknown_libname21". Debuggers fall flat on the same task. Really, the key to using IDA with the debugger is to identify which code to step into and which code to step over. I use IDA to ensure that I never step into library code, something that is all too easy to do when using a debugger alone.
Jake, thanks for taking the time to share your insights on the world of malware analysis and reverse-engineering! Folks, parts 1 and 2 of this interview are also available for your reading pleasure.
Jake Williams and Lenny Zeltser will be co-teaching the FOR610: Reverse-Engineering Malware course on-line live March 28-April 29, 2013. Get a choice of a MacBook Air, Toshiba Portege Ultrabook or $850 discount when you register for this class before March 13, 2013.