Saturday, November 17, 2012

Get-MethodAddress - A Tool For Comparing .NET MSIL and ASM Method Implementations

Download: Get-MethodAddress

Lately, as part of my research, I've found myself wanting to learn more about how MSIL (Microsoft Intermediate Language) opcodes in .NET assemblies get translated to assembly language instructions. Unfortunately, there was no easy way that I was aware of to get the unmanaged address of a .NET method. After digging in to the wealth of MSIL opcodes at my disposal, I learned that the Ldftn opcode would suit my needs. This find resulted the Get-MethodAddress PowerShell cmdlet. Get-MethodAddress uses reflection to build an assembly on the fly and assemble a method using the opcodes of my choosing - specifically, Ldftn. Here is the code:
The relevant lines in the code are the ones that specify the MSIL opcodes to be assembled:

$Generator.Emit([System.Reflection.Emit.OpCodes]::Ldftn, $MethodInfo)
$Generator.Emit([System.Reflection.Emit.OpCodes]::Conv_Ovf_U8)
$Generator.Emit([System.Reflection.Emit.OpCodes]::Ret)


Ldftn as described by Microsoft "pushes an unmanaged pointer (type native int) to the native code implementing a specific method onto the evaluation stack.I then convert the native int to an unsigned int64 using the Conv_Ovf_U8 opcode and then return the value to the caller with Ret.

So how might one use this cmdlet? As an example, say I'm interested in the IL and ASM implementation of the [System.Intptr].ToPointer method. To get the IL of this method, you could use your .NET disassembler of choice. I like PowerShell so let's use that:

PS> ([IntPtr].GetMethod('ToPointer').GetMethodBody().GetILAsByteArray() | % {"0x$($_.ToString('X2'))"}) -join ','
0x02,0x7B,0x53,0x04,0x00,0x04,0x2A

The IL opcodes above translate into the following disassembly:

0x02                     ldarg.0
0x7B,0x53,0x04,0x00,0x04 ldfld void* System.IntPtr::m_value
0x2A                     ret

The code above simply loads a reference to an instance of an IntPtr object, dereferences the value held in the m_value field and returns the result. I suspect that the JITed representation would be equally straightforward. Let's confirm that:

PS> Get-MethodAddress ([IntPtr].GetMethod('ToPointer'))
0x000007FF35544CC0

Viewing the assembly instructions in WinDbg yielded the following:

mscorlib_ni+0xd04cc0:
000007ff`35544cc0 488b01    mov     rax,qword ptr [rcx]
000007ff`35544cc3 c3        ret
000007ff`35544cc4 cc        int     3


The assembly above does exactly what I expected. When ToPointer gets executed, the m_value field of the IntPtr instance gets loaded into the rcx register and dereferenced. Moving this value into rax followed by a ret implies that the dereferenced value is the return value of the ToPointer method.

It's worth noting the module name in the WinDbg output - mscorlib_ni. NI stands for "native image" which means that the version of mscorlib that was loaded into the PowerShell process was the version whose IL was converted to assembly language ahead of time.

Lastly, bear in mind that the combination of opcodes I used in the cmdlet are unverifiable which basically means that this technique cannot be used in more restricted .NET implementation (i.e. Silverlight, Windows Runtime, etc.). For a reference of IL opcodes and IL verification, read ECMA-335 CLI Partition III - CIL.