Professional Documents
Culture Documents
CSAW2016 SOS Virtual Machine Deobfuscation RThomas JSalwan
CSAW2016 SOS Virtual Machine Deobfuscation RThomas JSalwan
CSAW2016 SOS Virtual Machine Deobfuscation RThomas JSalwan
software protections
How to don’t kill yourself when you reverse obfuscated codes.
Romain Thomas
Jonathan Salwan
2
Roadmap of this talk
3
The Triton framework [6]
Triton in a nutshell
5
Triton’s design
Example of Tracers
Pin
Symbolic IR
Taint
Taint
Execution SMT2-Lib
Engine
Engine Engine Semantics
Valgrind
API
C++ / Python
DynamoRio
SMT SMT SMT
Solver Optimization
Multi-Arch
Simplifications
Interface Passes design
Rules
Qemu
6
The API’s input - Opcode to semantics
Instruction API
Instruction semantics
over AST
7
The API’s input - Semantics with a context
Context
API
Instruction
Instruction semantics
over AST
8
The API’s input - Taint Analysis
API
9
The API’s input - Symbolic Execution
API
π
10
The API’s input - Simplification / Transformation
API
Simplification / Transformation
Instruction semantics
over AST
11
The API’s input - AST representations
Instruction semantics
over AST
12
The API’s input - Symbolic Emulation
Instruction 1
Instruction 2
Symbolic
API Emulation
Instruction 3
Instruction 4
13
Example - How to define an opcode and context
>>> inst.setAddress(0x400000)
>>> inst.updateContext(Register(REG.RAX, 0x1234))
>>> inst.updateContext(Register(REG.RDX, 0x5678))
>>> processing(inst)
14
Example - How to get semantics expressions
>>> processing(inst)
16
Example - How to get implicit and explicit written registers
17
To resume: What kind of information can I get from an instruction?
• Semantics (side effects included) representation via an abstract syntax tree based
on the Static Single Assignment (SSA) form
18
What about emulation?
>>> processing(inst1)
>>> processing(inst2)
>>> getFullAstFromId(getSymbolicRegisterId(REG.RAX))
((0x5 + 0x2) & 0xFFFFFFFFFFFFFFFF)
>>> getAstFromId(getSymbolicRegisterId(REG.RAX)).evaluate()
7L
19
Ok, but what can I do with all of this?
20
Mmmmh, and where instruction sequences can come from?
21
Cool, but how many instruction semantics are supported by Triton?
• Development:
• 256 Intel x86 64 instructions 1
• Included 116 SSE/MMX/AVX instructions
• Testing:
• The tests suite 2 of the Qemu TCG 3
• Traces differential 4
1
http://triton.quarkslab.com/documentation/doxygen/SMT Semantics Supported page.html
2
http://github.com/qemu/qemu/tree/master/tests/tcg
3
http://wiki.qemu.org/Documentation/TCG
4
http://triton.quarkslab.com/blog/What-kind-of-semantics-information-Triton-can-provide/#4
22
Virtual Machine Based Software
Protections
VM Based Software Protections
Definition:
It’s a kind of obfuscation which transforms an original instruction set (e.g. x86) into
another custom instruction set (VM implementation).
24
Example: Virtualization
• Malwares: Zeus 8
• CTF...
5
http://vmpsoft.com/
6
http://tigress.cs.arizona.edu/
7
http://www.denuvo.com/
8
http://www.miasm.re/blog/2016/09/03/zeusvm analysis.html
26
VM abstract architecture
Fetch Instruction
Decode Instruction
Dispatch
Terminator
27
VM abstract architecture
Fetch Instruction:
Fetch the instruction which will be executed by the VM.
Decode Instruction:
Decode the instruction according to the VM instruction set.
Example:
decode(01 11 12):
• Opcode: 0x01
• Operand 1: 0x11
• Operand 2: 0x12
28
VM abstract architecture
Dispatcher:
Jump to the right handler according to opcode and/or operands.
Handlers:
Handlers are the implementation of the VM instruction set.
For instance, the handler for the instruction
mov REG, IMM
could be:
xor REG, REG
or REG, IMM
Terminator:
Finishes the VM execution or continues its execution.
29
Dispatcher
30
A switch case like dispatcher
31
A jump table based dispatcher
32
Using Triton to reverse a VM
Fetch Instruction
Decode Instruction
Triton
Dispatch
Terminator
33
Demo: Tigress VM
Tigress challenges
35
Tigress challenges
$ ./tigress-challenge 1234
3920664950602727424
$ ./tigress-challenge 326423564
16724117216240346858
36
Tigress challenges
Problem: Given a very secret algorithm obfuscated with a VM. How can we recover
the algorithm without fully reversing the VM?
37
Step 1: Symbolically emulate the binary
Symbolic
Emulation
Obfuscated binary
Trace semantics
38
Step 2: Define the user input as symbolic variable
39
Step 3: Concretize everything which is not related to user input
+
+
- x
9 x
1 x π +
π 7
2 5 3 4
40
Step 4: Use a better canonical representation of expressions
π π
41
Step 5: Possible use of symbolic simplifications
π π
8
https://pythonhosted.org/arybo/concepts.html
42
Step 6: From Arybo to LLVM-IR
43
Step 7: Recompile with -O2 optimization and win!
Deobfuscated binary
LLVM-IR
44
Results with only one trace
45
Cover paths to reconstruct the CFG
U =
46
Results with the union of two traces
47
Time of extraction per trace
48
Let me try by myself
9
https://github.com/JonathanSalwan/Tigress protection
49
Demo: Unknown VM
VM Architecture
Fetch Instruction
Decode Instruction
Dispatch
Switch case
Terminator
51
VM Architecture
52
Goal
Fetch Instruction
Decode Instruction
Dispatch
Switch case
Terminator
53
Goal
Fetch Instruction
t
ou
Decode Instruction
e
m
Ti
op0 op1 op2 op3 op4
Dispatch
Switch case
Terminator
54
Goal
Fetch Instruction
Ste p 2
Decode Instruction
ep
St
Dispatch
Switch case
Terminator
55
Goal
Decode
c1→2
BB2
c2→4
c3→5
c4→5
56
Conclusion
Conclusion
58
Thanks
Any Questions?
59
Acknowledgements
10
https://github.com/quarkslab/arybo
60
Contact us
• Romain Thomas
• rthomas at quarkslab com
• @rh0main
• Jonathan Salwan
• jsalwan at quarkslab com
• @JonathanSalwan
• Triton team
• triton at quarkslab com
• @qb triton
• irc: #qb triton@freenode.org
61
References I
62
References II
J. C. King.
Symbolic execution and program testing.
Communications of the ACM, 19(7):385–394, 1976.
C. Lattner and V. Adve.
LLVM: A compilation framework for lifelong program analysis and
transformation.
pages 75–88, San Jose, CA, USA, Mar 2004.
63
References III
64