Monday, March 25, 2013

Parsing Binary File Formats with PowerShell

I'm giving a presentation on "Parsing Binary File Formats with PowerShell" for MiSec on Tuesday, March 26. For those who will not be attending, the slides and code are available for download.



In the presentation, I cover the following:

1) Why you would want to parse binary data
2) Why PowerShell is a powerful tool to accomplish this task
3) A brief overview of data types and how they differ across languages: C/C++, C#, PowerShell
4) Conversion from C data types to PowerShell/.NET data types
5) All concepts taught are applied by parsing the DOS header of a PE file.
6) DOS header overview
7) The three strategies for parsing binary data in PowerShell:
   a) Pure PowerShell-based approach using only PowerShell cmdlets (no .NET)
   b) C# compilation using the Add-Type cmdlet
   c) Reflection
8) Reading in binary data in PowerShell
9) I cover building a DOS header parser using the three strategies
10) Brief overview of reflection and .NET application layout
11) Applications of a DOS header parser
12) Bonus: Intro to the Rich signature
13) Bonus: I extend the DOS header parser to decode and parse the Rich signature

I also provide the following code:

1) Get-DosHeader_Pure_PowerShell.ps1 - A pure PowerShell-based implementation of the DOS header parser
2) Get-DosHeader_CSharp.ps1 - A DOS header parser using Add-Type to compile C# code
3) Get-DosHeader_Reflection.ps1 - A DOS header parser implemented using reflection
4) Get-DosHeader_Reflection_Bonus.ps1 - Same as #3 but extended to include a Rich signature decoder/parser
5) Get-DosHeader.format.ps1xml - A formatting file used to display a proper hexadecimal representation of the parsed DOS header

While the example I use throughout the presentation is a simple one, you would be surprised what information can be gleaned by performing analysis on known good DOS headers in PE files. For example, after scanning 6695 DOS headers, I found that the following fields were always 0: e_crlc, e_cparhdr, e_minalloc, e_ss, e_csum, e_ip, e_cs, e_ovno, e_oemid, e_oeminfo, e_res2. This simple heuristic alone could be used as a signature to detect a malformed DOS header/PE file. TinyPE is the perfect example. Also, a simple DOS header parser can be used to scan for all PE files on disk. What you'll discover is that there are some non-standard PE file extensions that you may not have been familiar with: .lrc, .ax, .rs, .tlb, .acm, .tsp, .efi, .rll, .ime, .old, .dat, .iec, etc.

The techniques that I describe can easily be used to parse any binary format - from a stupid DOS header parser to a PowerShell implementation of binwalk. The sky is the limit.

Enjoy!