Professional Documents
Culture Documents
Krakatoa: Decompilation in Java: Goto Goto
Krakatoa: Decompilation in Java: Goto Goto
Java Bytecodes
int bar(int a, int b) {
if (sam > a) {
Expression Builder b = a*2;
}
Flow graph with expressions and conditional gotos return b;
}
Node Sequencer }
ordering. Where implicit control
ow has changed, dering of the nodes (the augmenting path), Kraka-
the algorithm must add new branches to restore toa can apply Ramshaw's technique to eliminate
the original control
ow. Whenever possible, the goto's.
topological sort attempts to maintain the original
fall-through behavior.
This algorithm produces a reducible augmented 4 Code Transformations
graph. Because all loops are ordered separately,
and laid out contiguously, the only augmenting 4.1 Program Points
edge entering from outside enters at the top. The After applying Ramshaw's algorithm for eliminat-
topological sort of the loop (minus its backedges) ing goto's, Krakatoa has a complex, yet legal, Java
guarantees that this top node is the loop header and AST (see Figure 9). Krakatoa then proceeds to
that no internal edges cause irreducibility. Outside recover more of the natural high-level constructs
edges into the loop header cannot make a loop irre- of the original program (e.g. if-then-else, etc.).
ducible. Therefore, the resulting augmented graph Krakatoa uses a program point analysis to summa-
is reducible. rize a program's control-
ow and to guide recover-
Loops are not the only blocks of instructions ing high-level constructs. A program point is a syn-
which must be ordered contiguously. Exception tactic location in a program. Every statement has a
handling regions must form contiguous sections of program point both before and after it. These pro-
instructions. Class les specify which instructions gram points have two properties: reachability and
are in which regions. Our algorithm orders those equivalence class.
regions contiguously by treating them like loops. A program point is unreachable if and only if it is
After applying this technique to create a total or- preceded along all execution paths by an uncondi-
tional jump statement (i.e. return, throw, break, then-branch (Stmtlist1 ) can reach Stmtlist2 di-
or continue). For instance, in Figure 5, program rectly.
point 3, 3 , is unreachable, since it is preceded by a
return statement. 6 is reachable, however, since 4.4 Loop Rewriting Rules
one of the branches in the preceding if statement
does not end with a jump statement. The third rule in Table 1 removes useless continue
Two program points are equivalent (denoted as statements. If the program point after a continue
x y ) if and only if future computation of the statement is equivalent to the program point before
program is the same from both points. For in- the continue statement, then that continue can
stance, the program point before a break state- be removed.
ment is equivalent to the program point after the The fourth rule creates a short-circuit test ex-
loop it exits (3 and 8 in Figure 6). As an ex- pression within a for-loop by eliminating an inte-
ample, in Figure 6, 1 , 2 , 4 , 5, 6, and 7 are rior if statement. Doing so requires that the loop
equivalent, as are program points 3 and 8 . body begin with an if-then-else statement, and
Both reachability and equivalence are simple that the then branch of that statement consists
to compute via standard control-
ow analyses of a single jump to a program point equivalent to
[ASU86]. breaking out of the loop.
The fth transformation provides an example of
transforming loops into if statements. A loop is
4.2 AST Rewrite Rules equivalent to an if if it can never repeat itself, and if
Krakatoa performs a series of AST rewriting trans- all simple break statements can be safely removed
formations to recover as many of the \natural" pro- during the transformation. A loop never repeats
gram constructs as it can (e.g. if-then-else, etc.). if its last program point is unreachable. break's
Krakatoa applies these rewriting rules repeatedly may be removed if the immediately following (un-
until no changes occur. We have found that the reachable) program point is equivalent to the last
few rules below are sucient to retrieve high-level program point in the loop (1 in Table 1). The
constructs of the Java language, including if-then- transformation replaces the loop with an if state-
else statements, and short-circuit evaluation of ex- ment, and deletes all of the break statements for
pressions. Each rewriting rule reduces the size of that loop.
the AST, thus ensuring termination.
Table 1 summarizes the rules, which we describe 4.5 Short Circuit Evaluation
below in greater detail. Many of these rules gen- Rewriting Rules
eralize. Those that apply to for-loops often apply
to other loops. Many rules have several symmetric The sixth rule shown in Table 1 recovers a short-
cases. For example, the rst rule in Table 1 re- circuit Or conditional. Short-circuit Or's exist
moves an empty else-branch from an if-then-else when two adjacent conditionals guard the same
statement|there is a symmetric rule for removing statement list and failure of either will cause a
an empty then-branch by negating the predicate. branch to equivalent locations.
The last transformation in Table 1 recovers short-
circuit And expressions. This transformation is ap-
4.3 if-then-else Rewriting Rules plicable whenever a simple if statement represents
the entire body of another.
The rst transformation shown in Table 1 changes
an if-then-else statement into an if-then state-
ment when the else branch is empty. This trans-
formation is always legal.
5 Status
The second transformation creates an if-then- We have implemented a prototype Java decompiler,
else statement from an if-then statement by hoist- Krakatoa, in Java. We have run Krakatoa on a
ing the subsequent statement list into the else-part. number of class les, including some to which we
Our algorithm performs this transformation if and had no source code access. We examined the output
only if no reachable program point in Stmtlist1 is of Krakatoa by hand, and Krakatoa appears to re-
equivalent to the program point before Stmtlist2. cover high-level constructs very well. Figures 7{10
Essentially, this means that no statement in the provide an example of the stages of decompilation.
Rule Before After Conditions
Eliminate if expr f if expr f
Else Stmtlist Stmtlist None
g else f g g
if expr f if expr f
Stmtlist1 Stmtlist1 Stmtlist1 contains no
Create g else f reachable program
if-then-else g Stmtlist2 points equivalent to 1 .
1 : Stmtlist2 g
for (A ;B ;C) f for (A;B ;C ) f
Delete Stmtlist Stmtlist 1 2
Continues 1 continue 2 g
g
for (A ;B ;C ) f
if expr f for (A ;
1 jump B and not expr ;
Move g else f C)f 1 2, X is either a
Stmtlist1
Conditionals g Stmtlist1 break or continue
Stmtlist2 Stmtlist2
g g
2
for ( stmt ; expr ; ) f Stmtlist contains no
Stmtlist stmt reachable program
Remove 1 if expr f points equivalent to
Loop g Stmtlist0
1 . The program point
2 g after any break must be
equivalent to 1 .
if expr1 f
1 X if expr1 or expr2 f
g else f X
Create Short if expr2 f g X and Y are equivalent
Circuit Or's 2 Y else f jumps. (I.e., 1 2.)
g else f Stmtlist
Stmtlist g
g
g
if expr1 f
Create Short
if expr2 f if expr1 and expr2 f Neither if stmt has an
Stmtlist Stmtlist
Circuit And's g g else branch
g
Table 1: Canonical Code Transformation Rules
1 // f 2 , 4 , 5 , 6 , 7 g
for ( ; ; ) f
1 2
if ( a < b ) f if ( a < b ) f
2 3 // f 8 g
return a; break;
3 // (unreachable) 4 // (unreachable)
g g
else f else f
4 5
a = b; continue;
5 6 // (unreachable)
g g
6 7 // (unreachable)
g
8
Figure 5: Reachable Points
Figure 6: Equivalent Points
Figure 7 shows the original source code of a sam- Appendix B contains additional examples of
ple program. Figure 8 shows the results of expres- Krakatoa's output.
sion decompilation on the bytecode of this program.
Figure 9 shows the results of applying Ramshaw's
algorithm to the decompiled expression graph. Fig- 6 Countermeasures
ure 10 shows the result of the grammar rewrit-
ing rules applied to the output of Ramshaw's al- Krakatoa is very eective at reproducing readable
gorithm. Obviously, using DeMorgan's laws would Java source from Java bytecode. This may be
simplify the boolean expressions. Future versions alarming to those who want to protect their source
of Krakatoa will do so. code from unwanted copying. Unfortunately, there
are few countermeasures.
For the JVM dup operators, which duplicate One could introduce irreducible control-
ow
stack elements, Krakatoa simply creates a tempo- through bogus conditional jumps to foil Ramshaw's
rary variable to hold the duplicated value. This algorithm. This, however, only stops the recre-
yields unnatural, but easily readable, decompila- ation of high-level constructs. Krakatoa could sim-
tions. A more dicult problem is our failure to ply produce source code in a Java-like language ex-
recover the conditional-expression operator, \? :". tended with goto's.
This operation presents two diculties: it requires One could introduce bizarre stack behavior to foil
determining short-circuit operators during expres- expression recovery. This is dicult, however, be-
sion recovery, and it requires that expression recov- cause the behavior cannot be so bizarre as to yield
ery handle non-empty stacks at basic block bound- unveriable bytecode. It is possible, however, to
aries. Fortunately, the short-circuit problem can be create many bogus threads of control (i.e., threads
handled easily with four simple graph-writing rules that will never execute) that will confuse the ex-
given in [Cif93]. The non-empty stack problem is pression recovery mechanism in basic blocks that
dicult because it requires combining expressions are entered with non-empty stacks.
in our symbolic stack upon entering a basic block One code obfuscation technique that is modestly
with multiple predecessors. Krakatoa again uses a eective is to change the class le's symbol table to
temporary variable to hold the result of each branch contain bizarre names for elds and methods. So
of the conditional expression, and then assigns this long as cooperating classes agree on these names,
temporary value to the conditional expression. We the class les will link and execute correctly [vV96,
are currently investigating other solutions to this Sri96].
problem. Another suggested solution is to use dedicated
class foo f
void foo(int x, int y) f class foo f
while ((x + y < 10) && (x > 5)) f void foo(int i1, int i2) f
if ((y > x) k (y < 100)) f goto L4;
x = y; L1: if (i2 > i1) goto L2;
g if (i2 >= 100) goto L3;
else f L2: i1 = i2;
x += 100; goto L4;
g L3: i1 += 100;
g L4: if ((i1+i2)>=10) goto L5;
g if (i1 > 5) goto L1;
g L5: return;
g // foo
g // foo
Figure 7: Original Source
Figure 8: After Expression Decompilation
class foo f
void foo(int i1, int i2) f
lp3: for ( ; ; ) f
if ((i1 + i2) >= 10) break lp3; class foo f
if !((i1 > 5)) break lp3; void foo(int i1, int i2) f
lp2 : for ( ; ; ) f lp3: for ( ;!((i1+i2)>=10)&&((i1>5)); ) f
lp1 : for ( ; ; ) f if (i2 > i1) k !((i2 >= 100)) f
if (i2 > i1) break lp1; i1 = i2;
if !((i2 >= 100)) break lp1; g // then
break lp2; else f
g // lp1 i1 += 100;
i1 = i2; g
continue lp3; g // lp3
g // lp2 return;
i1 += 100; g
continue lp3; g
g // lp3
return;
g
g Figure 10: After AST Transformation (Final De-
compilation Results)
// instance variables
protected Vector members; protected java.util.Vector members;
} }
if (!(isMember(o))) { if !( (this.isMember(local1) != 0) ) {
members.addElement(o); this.members.addElement(local1)
} // then } // then
return;
} // addMember } // addMember
members.removeElement(o); this.members.removeElement(local1)
return;
} // removeMember } // removeMember
if (echo_ops) { if !( (Set.echo_ops == 0) ) {
System.out.println("unioning"); java.lang.System.out.println("unioning");
} } // then
local2.members = ((java.util.Vector)
this.members.clone());