Professional Documents
Culture Documents
DECOMPILER
DECOMPILER
html
DECOMPILER
(REVERSE ENGINEERING)
1 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
ABSTRACT
With major businesses focusing more and more on web enablement, the
proliferation of web-based applications, and the growth of many operating systems in the
specific to these systems. A compiler is system software that takes as input a program
written in a high level language and produces as output an executable program for a
In general, decompilers are used to recover lost source code. They work by
analyzing the byte-code of the software, and making educated guesses about the code that
created it. The input in this case is machine dependent, and the output is language
dependent. That is, an intermediate language representation can be formed and some
related code is generated in any high level language. Decompilation is a process that uses
some tools to load binary program into memory, parse or disassemble such a program,
and decompile or analyze the program to generate a high-level language program. The
accuracy depends on the benefits from compiler and library signatures to recognize
particular compilers and library subroutines. In most cases, the high level code generated
2 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
INTRODUCTION
With major businesses focusing more and more on web enablement, the
proliferation of web-based applications, and the growth of many operating systems in the
specific to these systems. A compiler is system software that takes as input a program
written in a high level language and produces as output an executable program for a
What is Decompilation?
In general, decompilers are used to recover lost source code. They work by
analyzing the byte-code of the software, and making educated guesses about the code that
created it. The input in this case is machine dependent, and the output is language
dependent. That is, an intermediate language representation can be formed and some
related code is generated in any high level language. Decompilation is a process that uses
some tools to load binary program into memory, parse or disassemble such a program,
and decompile or analyze the program to generate a high-level language program. The
accuracy depends on the benefits from compiler and library signatures to recognize
particular compilers and library subroutines. In most cases, the high level code generated
3 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
ETHICS OF DECOMPILATION
Is Decompilation Possible?
Yes and No. Fully automated decompilation is not possible – this problem is
Science. What this means is that decompilation cannot be achieved for all possible
programs that are ever written, and that the separation of data and code is hard to achieve.
Further, even if a certain degree of success is achieved, the generated program lacks
meaningful variable and function means as these are not normally stored in an executable
Some people believe it is only possible to recover the assembly sources; this in
itself is not a trivial problem again. However, in practice, there have been more
understanding of data and control flow of the executable would be possible. The more
successful ones make use of extra information (e.g. knowledge of the compiler used) or
4 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
When a programmer writes software, and releases it to the public, he (or she)
normally releases a compiled version of the application that users can run on their own
programmer has put a considerable amount of time and effort into producing it. The
source code behind the software is something private, that the programmer has created.
Programmers don’t want people looking for flaws in their software, and they don’t want
people to change the title of the software and then redistribute it as someone else’s
product. It is for this reason that programmers don’t often release their source code – but
few realize that every time we release compiled software, we are also giving people the
hacker from analyzing the code. While decompilers do represent a threat, they also can be
of great benefit to programmers. There are also many legitimate purposes for the use of
structure of data files to include support for that file-type in their application. Whether or
not such actions are legal is a gray area, but including support for competing
5 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
Decompilers aren’t necessary evil – but they do pose an ethical dilemma for many
software developers. The programmers can protect their software against decompilation
or at least make the task harder, by using special software that protects them from prying
eyes. Decompilers can also be used to steal the source code of competitors, or by hackers
to determine weaknesses in the design of software. But just blaming the compiler is
meaningless – it is the programmer who uses it for intellectual property theft, or the
hacker that decompiles the software to find security holes that is at fault.
Legal Aspects
Throughout the world, copyright law protects most programs. Copyright protects the
expression of an idea in the form of a program, hence protecting the developer’s (or
exclusive rights to the software developer, among others, the right to reproduce and make
adaptations to the developed computer program. It is a breach of these rights the making
countries have different exceptions to the copyright owner’s rights or precedent has been
established in court proceedings. This means that these are uses are allowed by law, but
6 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
of hardware) where the interface specification has not been made available
• Decompilation for the purposes of error correction where the owner of the
Decompilation is a tool for a computer professional. There are two major areas
• To structure old code written in an unstructured way (i.e. spaghetti code) into
a structured program
• To debug binary programs that are known to have bugs but for which the
7 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
• When multiple versions of source have been created and creation dates are
running object/executable.
In the latter area, decompilation is used as a tool to verify the object code
networks worldwide has raised the awareness of the need for tools and techniques to aid
in computer security analysis of binary code, such as the understanding of Malwares such
as Viruses, Trojans, Worms, Backdoors and general security flaws, in order to provide
immediate solutions with or without the aid of software vendors, whether these are
The classical technique used to study malware is the use of a debugger to step the
executable program (containing thousands of lines of assembly code) one assembly line
at a time until the problem is found – it is then possible to reconstruct that part of the
traced program in order to provide a solution for it. This method requires an expert
engineer that understands assembly code – a skill that is disappearing as years go by, due
to the increasing use of higher-level languages such as C++ and Java. By decompilation,
we can reduce the amount of code that the engineer has to process, and present the
engineer with a higher level of abstraction, so that only fewer man-hours will be needed
in order to understand the program’s code. Further these techniques will reduce the
additional skills and training required for professionals working in especially network
8 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
security teams. Thus, decompilation would effectively help in reducing the amount of
time needed to trace a security flaw in an executable program, as well as reducing the
DECOMPILATION PROBLEMS
A decompiler writer has to face several theoretical and practical problems when
writing it. Some of these problems can be solved by use of heuristic methods, others
• Architecture-dependent Restrictions
RUN-TIME ENVIRONMENT
Before considering decompilation, the relations between the static binary code of
the program and the actions performed at run-time to implement the program should be
9 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
an equivalent data object in the machine often represents elementary data types such as
integers, characters, and reals, whereas aggregate objects such as arrays, strings, and
composed of one or more subroutines, called the user sub-routines. The corresponding
binary program is composed of user subroutines, library routines that were invoked by
the user program, and other subroutines linked in by the linker to provide support for the
compiler at run-time. The general format of the binary code contains a startup-code, user
program including library routines and an exit code. For DOS and Windows
environments, when a program is loaded into memory, a Program Segment Prefix is built
on the earlier bytes of the allocated memory, and it contains important information such
Each subroutine is associated with a stack frame during run-time containing set of
parameters, local variables, and return address of the caller subroutine. Entering a
returning a value, exiting the subroutine and parameter parsing are some of the important
tasks to be analyzed from the byte code. Meanwhile, a symbol table is normally built to
store information on variables used throughout the program. Variables are identified by
their address; variables that have physical memory address are global variables and that
are located at a negative offset from the stack pointer are local variables to corresponding
stack frame’s subroutine and variables at positive offsets are actual arguments to the
subroutine. Register variables needs a special attention. The symbol table would grow
10 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
PHASES OF A DECOMPILER
series of phases that transform the source machine program from one representation to
another.
Syntax Analyzer: It groups bytes of the source program into grammatical phrases
(or sentences) of the source machine language, using a parse tree. Case tables are
Semantic Analyzer: It checks the source program for the semantic meaning of
groups of instructions; gathers type information and propagates this type across
the subroutine.
source program is necessary for the decompiler to analyze the program. The
second pass would use this intermediate code to generate target language code.
11 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
intermediate jumps and to determine the high-level control structures used in the
program.
selecting names for local and global variables. Subroutine names are selected
DECOMPILATION SYSTEM
Loader: It loads binary program into memory and relocates the machine code if it
12 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
Library Binder: It binds the subroutine names to the appropriate library routines.
level program, such as converting generic set of control structures (while loops) to
A Decompilation System
signatures for DOS executables produced by Turbo C v1.0 Compiler, based on 8086
instruction set. A simple approach to decompile a binary executable is to first parse it and
separate it into functions (C style). Once we know where the entry point of the program
(“main” for a C program), we can start decompiling that function, and any other function
13 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
it calls. After we have separated out the instructions for a function, we need to emulate
the processor and interpret each and every machine instruction, to combine logical group
Mov bx, 20
Mul bx
Add ax, 4
Mov [bp+4], ax
i) wAX = [bp+4]
ii) wBX = 20
14 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
Cmp ax, 10
Jnz labl
Mov bx, 15
Mov [bp+2], bx
Jmp lab2
Mov [bp+2], bx
Lab2:
Using cmp and jnz instructions, we conclude: if (i! =10) j=20; else j=15; But, we
function calling is to push the parameters onto stack and then call the function.
So, “push” and “call” instructions indicate a function call. Look at the code:
Push ax
Push ax
Call _func
Mov [bp+4], ax
Matching [bp+4] with “i” and [bp+2] with “j”, we get: i = func (j, i);
15 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
CONCLUSION
Reverse Engineering is a field of research, which has attracted many hackers and
retrieve the source code in case of emergency, without the violation of laws. Decompilers
are usually interesting enough due to various reasons such as multiple versions of the
compiler for the sample platform exists, the compiler itself will continue to be changed
and those changes must be kept up with, etc. Nobody can stop unscrupulous persons and
hackers from illegal decompiling, but it should be directed to be used for valid purposes.
Cracking of programs protected by copyright is not only illegal, but it rides on others’
creative effort.
16 Email: chinna_chetan05@yahoo.com