Saturday, July 30, 2011

Integrating WinDbg and IDA for Improved Code Flow Analysis

IDA is hands down the best tool for static analysis. Its debugger on the other hand, when compared to the power of WinDbg is certainly lacking, IMHO. As such, I find myself wasting too much time switching between windows and manually highlighting and commenting instructions in IDA as I trace through them in WinDbg. Fortunately, the power of IDApython can be unleashed to reduce this tedium.

I was reading an older TippingPoint MindshaRE article from Cody Pierce entitled “Hit Tracing in WinDbg[1]” and was inspired by his ideas to implement my own IDApython script to better integrate WinDbg with IDA. While, I may be recreating many of his efforts, my primary intent was to get better at scripting in IDApython while improving upon my static/dynamic analysis workflow.

The purpose of my script is to parse WinDbg log files for module base addresses, instruction addresses, references to register values, and pointer dereferences. Then, for every instruction you hit in your debug session, the corresponding instructions will be colored and commented accordingly and in a module base address-agnostic fashion.

To get started, your WinDbg log file will need to display the following:

• Disassembly of current instruction
• Effective address for current instruction
• Register state
• Listing of loaded modules (optional, but highly recommended)

The first three items should be enabled by default but can be re-enabled with the ‘.prompt_allow’ command. Your output should look like the following:

eax=003762f8 ebx=006cc278 ecx=00000004 edx=00000000 esi=006cc420 edi=00000001
eip=00491469 esp=0020e348 ebp=0020e388 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
calc!CCalcEngine::DoOperation+0x6be:
00491469 e8074efeff      call    calc!divrat (00476275)

The reason it is ideal to include module base addresses in your log file is so that effective addresses can be calculated as offsets then added to the base address of the module in IDA. This helps because the module base address in WinDbg is often different than the base address in IDA. Also, this avoids having to modify the dumpfile to match IDA or rebase the addresses in IDA – both, a pain and unnecessary. My script parses the output of ‘lmp’ in WinDbg which outputs the base addresses of loaded modules minus symbol information.

To continue riding Cody’s coattails, I’ll be analyzing the DoOperation function (0x00495001-0x00495051) in calc.exe as he did. I determined the beginning and ending addresses of the function with the following commands:

0:000> X calc!*DoOperation*
00495001 calc!CCalcEngine::DoOperation =
0:000> BP 00495001
0:000> g

Entering in 1000 / 4 in calc

Breakpoint 0 hit
eax=00439220 ebx=0069c420 ecx=0069c278 edx=00000000 esi=0069c278 edi=00000000
eip=00495001 esp=0032e95c ebp=0032ea4c iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
calc!CCalcEngine::DoOperation:
00495001 6a1c push 1Ch

Step through function until return instruction

0:000> PT
eax=00000000 ebx=0069c420 ecx=00495051 edx=00445310 esi=0069c278 edi=00000000
eip=00495051 esp=0032e95c ebp=0032ea4c iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
calc!CCalcEngine::DoOperation+0xfa:
00495051 c20c00 ret 0Ch
0:000> g
0:004> .logopen "calc.log"

The following dump is the result of the operation 1000/4 in the function without stepping into calls (using the PA command):


Obviously, manually annotating all of this information in IDA is the work of an unpaid intern. I, on the other hand, value my time and wrote the following script to parse out the log file and import it into IDA:


The script has the following features:

• Prompts user for WinDbg log file to load
• Prompts user for the instruction highlight color (real men use pink) ;P
• Parses the output of the ‘lmp’ command and calculates instruction offsets from the module base address
• Creates comments for the values of referenced registers
• Creates comments for the value of pointer dereferences

Here is what the graph of DoOperation looks like before 1000 / 4 is performed:


Here is what the graph of DoOperation looks like after 1000 / 4 is performed:


You can see the highlighted instructions and the values of any referenced registers and pointers.


Hopefully, by now you can see the power of IDApython in improving your analysis workflow. If you use this script, I welcome your feedback!

References:

1. Cody Pierce, “MindshaRE: Hit Tracing in WinDbg,” July 17, 2008, http://dvlabs.tippingpoint.com/blog/2008/07/17/mindshare-hit-tracing-in-windbg