Analysis and Visualization of Common Packers

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

Analysis and Visualization of

Common Packers
PowerOfCommunity, Seoul
Ero Carrera - ero.carrera@gmail.com
Reverse Engineer at zynamics GmbH
Chief Research Officer at VirusTotal
Introduction
An historical perspective

Originally meant to save space by reducing the


redundancy in executable file formats
Simply compressed parts or the whole of the
executable
Created a new "envelope" around it that restored the
original executable and the passed control to it
The decompressing envelope did not much more than
just restoring the executable
Evolution of the techniques

Compression provided a trivial degree of obfuscation,


but obfuscation nonetheless
Was easy to add additional measures in the
decompressing envelope
Overview of the techniques
Destruction of informational
components

Import address table


Simple late reconstruction into an original form
Construction of new connectivity artifacts between
the original code and imported modules
Strings
Anti-debug

Aimed at making tracing hard


Using SEHs triggered by hard to handle exceptions
Confuse debuggers throwing INTs they use
Calling hard-to-hook low level APIs/syscalls
Checking for hooks
Anti-environment

VM detection.
VMWare, VirtualPC, etc
Techniques aimed against specific tools
OllyDBG, IDA, Softice, etc
Breaking tools

Tricks detecting, confusing or aimed at crashing some


of the most common tools
IDA
OllyDBG
Procdump
Softice
Anti-analysis

Code obfuscation
Adding junk code, using opaque predicates
Code transformation
Virtual machines
Flow obfuscation (SEH, Nanomites)
Tools
Bochs
Provides with a high-level view
No need to worry about most of the anti-* techniques
Windbg
Can do kernel-mode debugging, hook syscalls, look
deeper that user-mode tools
Inspection of physical/virtual memory
Memoryze, for the real hardcore
Obfuscation & Anti-Analysis
Basic trickery against
analysis algorithms
Most tools will linearly disassemble a chunk of code
Introduce and non-terminal flow branching instruction
(not a ret or jmp)
Make it point later in the code, to the middle of what
would be an instruction if disassembling linearly
Result => confusion
(latest IDA, 5.3, has some workarounds against this)
Also: indirect obfuscation through heavy optimization
Example: ASPack (original)

0101F001 60________ pusha

0101F002 E803000000 call near ptr loc_101F007+1

0101F007 E9EB045D45 jmp near ptr 465EF4F7h


Example: ASPack (fixed)

0101F001 60________ pusha

0101F002 E803000000 call loc_101F008

0101F007 E9________ db 0E9h ; T

0101F008 EB04______ jmp short loc_101F00E


Example: Linear
Disassembly
0DFE000000 or eax, 0FEh

3D02000000 cmp eax, 2

E901000000 jmp $+6

75B8______ jnz short near ptr 0FFFFFFC9h

F3C001C0__ rep rol byte ptr [ecx], 0C0h

C3________ ret
Example: Linear
Disassembly
0DFE000000 or eax, 0FEh

3D02000000 cmp eax, 2

E901000000 jmp $+6 // HERE

75________ db 0x75

HERE: B8F3C001C0 mov eax, 0xc001c0f3

C3________ ret
Example: Opaque Predicate

0DFE000000 and eax, 2

3D02000000 cmp eax, 2

7401______ jz $+3

75B8______ jnz short near ptr 0FFFFFFC6h

F3C001C0__ rep rol byte ptr [ecx], 0C0h

C3________ ret
Example: Opaque Predicate

0DFE000000 and eax, 2

3D02000000 cmp eax, 2

7401______ jz $+3

75________ db 0x75

B8F3C001C0 mov eax, 0xc001c0f3

C3________ ret
Executable Image Function
address instruction (operand, ...)
address instruction (operand, ...)
address instruction (operand, ...)
Memory Page ...
Function Chunk
Function Chunk
Function Chunk
address instruction (operand, ...)
Function Chunk address instruction (operand, ...)
Function Chunk address instruction (operand, ...)
Memory Page ...

address instruction (operand, ...)


address instruction (operand, ...)
address instruction (operand, ...)
...

Memory Page
address instruction (operand, ...)
address instruction (operand, ...)
address instruction (operand, ...)
...
address instruction (operand, ...)
address instruction (operand, ...)
address instruction (operand, ...)
...
Executable Image Function
address instruction (operand, ...)
address instruction (operand, ...)
address instruction (operand, ...)
...

Memory Page
Function Chunk
address instruction (operand, ...)
address instruction (operand, ...)
Function Chunk address instruction (operand, ...)
address instruction (operand, ...) ...
address instruction (operand, ...)
Memory Page address instruction (operand, ...)
...

Function Chunk
address instruction (operand, ...)
address instruction (operand, ...)
Memory Page address instruction (operand, ...)
...

Function Chunk address instruction (operand, ...)


address instruction (operand, ...)
Function Chunk address instruction (operand, ...)
...
Function A address instruction (operand, ...)
Function B address instruction (operand, ...)
address instruction (operand, ...) address instruction (operand, ...)
address instruction (operand, ...) address instruction (operand, ...)
... ...

address instruction (operand, ...) address instruction (operand, ...) address instruction (operand, ...)
address instruction (operand, ...) address instruction (operand, ...) address instruction (operand, ...)
address instruction (operand, ...) address instruction (operand, ...) address instruction (operand, ...)
... ... ...

address instruction (operand, ...)


address instruction (operand, ...)
address instruction (operand, ...)
address instruction (operand, ...)
address instruction (operand, ...)
address instruction (operand, ...)
...
...

Shared Blocks

address instruction (operand, ...)


address instruction (operand, ...)
address instruction (operand, ...)
...

address instruction (operand, ...)


address instruction (operand, ...)
address instruction (operand, ...)
...
Junk. Polymorphic and
static
Junk Code

pusha
popa
Non-Standard Branching
Junk JMP insertion
Exmple: Junk Code I
(Themida)
018900CE 48________ dec eax

018900CF 60________ pusha

018900D0 B9FBDEF000 mov ecx, 0F0DEFBh

018900D5 50________ push eax

018900D6 9C________ pushf

018900D7 E912000000 jmp loc_18900EE

018900D7 [junk data]


Exmple: Junk Code II
(Themida)
018900EE loc_18900EE:

018900EE E90E000000__ jmp loc_1890101

018900EE [junk data]

01890101 loc_1890101:

01890101 9D__________ popf

01890102 5E__________ pop esi

01890103 61__________ popa

01890104 0F844E06FA7A jz loc_7C830758


Virtual Machines

Visual Basic, Java, Python, Ruby, Perl, .NET


Starforce, VMProtect, x86 Virtualizer, Themida/
CodeVirtualizer
At a high-level it’s a: fetch, decode, handle algorithm
Virtualized Code

Runs

Virtual CPU

Standard Code

Runs Runs

Real CPU Real CPU


Registers
Virtual CPU opcodes

-General Purpose Fetch Instruction Pointer


-Instruction Pointer opA reg1, reg2
-Stack pointer 1 opB reg2
branchA XYZ

4 Update registers
Real CPU opcodes Registers
Execute handler
handler for opA Virtual CPU
3
2 Decode
handler for opB Decoder

handler for opC

handler for branchA


Decoder
...
-Look up operand in table
-Call handler
Virtual Machine
Countermeasures
Rolf Rolles and Boris Lau have already shown that
optimization/reduction techniques can help
Translating to an intermediate representation and
performing optimization in the code leads to reduced
forms
You could use a tool like Peter’s “Find executable code”
to discover instruction handlers from a memory dump
of the VM
Advanced Packers

Some of the hardest current packers are VMProtect,


Themida, Armadillo
They incorporate some complex, custom techniques
Usually commercial products protectors
Armadillo
Armadillo
Double process debugging, debug blocker
Nanomites
Strategic Code Splicing
Armadillo's invalid instructions
LOCK prefix
Invalid MOV
Debug
Parent process Child process

Child's code
Transfer control
Parent catches it INT 3 push ebp
mov ebp, esp
push 0
push 0
call XYZ
Look up address cmp eax, 0
INT 3
.
.
.
Find target .
mov [UWZ], 0xff
pop ebp
ret

Set target in child


context

Debug
Resume child Child process

Parent Transfer control


INT 3
catches it
Themida
Themida's API obfuscation
The general algorithm can be summarized as:
Retrieve the API's function body
Perform a basic analysis and disassembly
Reconstruct the API's function body inserting junk in
between each of the real instructions
Re-assemble functionality, keep the semantics, change
the syntax
Standard Imports
Executable Imported DLL
A function references other
code
Executable Imported DLL

Exported DLL Function

Internal DLL Function


Some of the references are
kept
Themida protected executable Imported DLL

Exported DLL Function

DLL Function (Obfuscated)

Internal DLL Function


Reconstruction

The algorithm has limitations


References to other functions within the DLL are kept
Same for true branches of conditional branches
Those two points can allow us to do API discovery by
studying their connectivity
Themida's obfuscation

Adds lots of branching and junk


Keeps few "real" instructions per obfuscated block
IDA can “easily” deal with the branching
Although bogus calls break IDA analysis and lead to
broken obfuscated functions
Some scripting can make this look better
Current state
Packing vs unpacking
Packing is not always a symmetric proces, sometimes
it can't be undone perfectly
You won’t get the original process back
Can it be done generically? Some cases the answer is
"mostly" yes
You will mostly always be able to obtain code close to
its original form
Recent techniques

skape documented an elegant trick on Uninformed 10


a few weeks ago
Attacks a basic heuristic used by most “generic”
unpackers
Tracking execution transfer to “dirty-memory”
Virtual Address Range A Virtual Address Range A Virtual Address Range A

WRITE WRITE

TIME
Virtual Address Range A

Physical Memory

MMU

Virtual Address Range A


Virtual Address Range A

WRITE

Physical Memory

MMU

Virtual Address Range A

EXECUTE
Countermeasures

Windbg can see the mappings from virtual to physical


Not to hard to spot doubled mapped regions
Bochs and other low level emulators can easily do it as
well
Requires kernel-mode access or “higher”
References

Reversing. Secrets of Reverse Engineering. Eldad Eilam


Déprotection semi-automatique de binaire, Yoann
Guillot & Alexandre Gazet
http://metasm.cr0.org/SSTIC08-article-Guillot_Gazet-
Deprotection_Semi_Automatique_Binaire.pdf

Virtual Machine Threats , Peter Ferrie


http://www.symantec.com/avcenter/reference/Virtual_Machine_Threats.pdf
References II
A Quick Survey on Automatic Unpacking Techniques,
Daniel Reynaud
http://indefinitestudies.wordpress.com/2008/09/25/automatic-unpacking/

Using dual-mappings to evade automated unpackers,


skape
http://www.uninformed.org/?v=10&a=1

Dealing with Virtualization packer, Boris Lau


http://www.datasecurity-event.com/uploads/boris_lau_virtualization_obfs.pdf
References III

Rolf Rolles blog in OpenRCE


https://www.openrce.org/blog/browse/RolfRolles

Oreans Themida/CodeVirtualizer
http://www.oreans.com

ReWolf's x86 Virtualizer


http://www.openrce.org/blog/view/847/x86_Virtualizer_-_source_code
References IV

VMProtect
http://www.vmprotect.ru/

Deroko's Nanomite's write up


http://www.phearless.org/i3/Nanomites_And_Misc_Stuff.txt

Memoryze, Mandiant
http://www.mandiant.com/software/memoryze.htm
Thanks to Vangelis and the
PoC crew!

Q&A

You might also like