03 Fundamental Software Engineering Concepts

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

ITGY400 – REVERSE

ENGINEERING AND
MALWARE ANALYSIS

Topic – Fundamental Software


Engineering Concepts
Fundamental Software Engineering Concepts

● Here we present a very brief overview of the conventional, high-level


perspective of software that every software developer has been
exposed to.
● We then proceed to an introduction of low-level software and
demonstrate how fundamental high-level software concepts map onto
the low-level realm
High-Level Perspective

● Here let discuss the following software engineering concepts such as


-
− program structure (procedures, objects, and the like),
− data management concepts (such as typical data structures,
the role of variables, and so on), and
− basic control flow constructs.
High-Level Perspective – Program Structure

● Program structure is the thing that makes software, an inherently large


and complex thing, manageable by humans.
● It is broken into small chunks where each chunk represents a “unit” in
the program in order to conveniently create a mental image of the
program in our minds.
High-Level Perspective – Program Structure

● The same process takes place during reverse engineering. Reversers


must try and reconstruct this map of the various components that
together make up a program
● To break down software into manageable chunks, the general idea is
to view the program as a set of separate black boxes that are
responsible for very specific and (hopefully) accurately defined tasks.
High-Level Perspective – Program Structure

● The idea is that someone designs and implements a black box, tests it
and confirms that it works, and then integrates it with other
components in the system.
● A program can therefore be seen as a large collection of black boxes
that interact with one another.
High-Level Perspective – Program Structure

● Likewise, when an application is being designed it is usually broken


down into mental black boxes that are each responsible for a chunk of
the application.
● For instance, in a word processor you could view the text-editing
component as one box and the spell checker component as another
box
● This process is called encapsulation because each component box
encapsulates certain functionality and simply makes it available to
whoever needs it, without exposing unnecessary details about the
internal implementation of the component
High-Level Perspective – Program Structure

● Component boxes are frequently developed by different people or


even by different groups, but they still must be able to interact
● Developing a robust and reliable product rests primarily on two
factors:
− that each component box is well implemented and reliably
performs its duties, and
− that each box has a well defined interface for communicating
with the outside world
High-Level Perspective – Program Structure

● In most reversing scenarios, the first step is to determine the


component structure of the application and the exact responsibilities
of each component
● From there, one usually picks a component of interest and delves into
the details of its implementation
High-Level Perspective – Program Structure

● The following are various technical tools available to software


developers for implementing this type of component-level
encapsulation in the code
− Modules
− Code Construct : Procedure and Object
High-Level Perspective – Program Structure : Modules

● The largest building block for a program is the module.


● Modules are simply binary files that contain isolated areas of a
program’s executable (essentially the component boxes from our
previous discussion).
● There are two basic types of modules that can be combined together
to make a program:
− static libraries and
− dynamic libraries.
High-Level Perspective – Program Structure : Modules

● Static libraries - Static libraries make up a group of source-code files


that are built together and represent a certain component of a
program.
● Logically, static libraries usually represent a feature or an area of
functionality in the program.
● A static library is not an integral part of the product that’s being
developed but rather an external, third-party library that adds certain
functionality to it.
High-Level Perspective – Program Structure : Modules

● Static libraries are added to a program while it is being built, and they
become an integral part of the program’s binaries.
● They are difficult to make out and isolate when we look at the program
from a low-level perspective while reversing
High-Level Perspective – Program Structure : Modules

● Dynamic libraries - Dynamic libraries (called Dynamic Link Libraries,


or DLLs in Windows) are similar to static libraries, except that they are
not embedded into the program, and they remain in a separate file,
even when the program is shipped to the end user
● A dynamic library allows for upgrading individual components in a
program without updating the entire program.
● As long as the interface it exports remains constant, a library can (at
least in theory) be replaced seamlessly—without upgrading any other
components in the program.
High-Level Perspective – Program Structure : Modules

● Dynamic libraries are very easy to detect while reversing, and the
interfaces between them often simplify the reversing process because
they provide helpful hints regarding the program’s architecture
High-Level Perspective – Program Structure : Code
Construct

● There are two basic code-level constructs that are considered the
most fundamental building blocks for a program.
● These are
− procedures and
− objects
High-Level Perspective – Program Structure : Code
Construct

● The procedure is the most fundamental unit in software


● A procedure is a piece of code, usually with a well-defined purpose,
that can be invoked by other areas in the program.
● Procedures can optionally receive input data from the caller and return
data to the caller.
● Procedures are the most commonly used form of encapsulation in any
programming language
High-Level Perspective – Program Structure : Code
Construct

● Designing a program using objects is an entirely different process


than the process of designing a regular procedure-based program.
● This process is called object-oriented design (OOD), and is
considered by many to be the most popular and effective approach to
software design currently available
High-Level Perspective – Program Structure : Code
Construct

● OOD methodology defines an object as a program component that


has both data and code associated with it.
● The code can be a set of procedures that is related to the object and
can manipulate its data
● The data is part of the object and is usually private, meaning that it
can only be accessed by object code, but not from the outside world
High-Level Perspective – Program Structure : Code
Construct

● OOD methodology defines an object as a program component that


has both data and code associated with it.
● The code can be a set of procedures that is related to the object and
can manipulate its data
● The data is part of the object and is usually private, meaning that it
can only be accessed by object code, but not from the outside world
High-Level Perspective – Data Management

● A program deals with data. Any operation always requires input data,
room for intermediate data, and a way to send back results managed
in the program
● To view a program and understand what is happening, we must
understand how data is managed in the program
● This requires two perspectives:
– the high-level perspective as viewed by software
developers and
– the low-level perspective that is viewed by reversers.
High-Level Perspective – Data Management

● Brief overview of high-level data constructs


● Variables
● User-Defined Data Structures
● Control Flow
Low-Level Perspective

● One of the most important differences between high-level


programming languages and any kind of low-level representation of a
program is in data management
● High-level programming languages hide quite a few details regarding
data management
● Let compare this high-level code snippet below with its low-level
representation -
int Multiply(int x, int y){
int z;
z = x * y;
return z;
Low-Level Perspective

● 1. Store machine state prior to executing function code


● 2. Allocate memory for z
● 3. Load parameters x and y from memory into internal processor
memory (registers)
● 4. Multiply x by y and store the result in a register
● 5. Optionally copy the multiplication result back into the memory area
previously allocated for z
● 6. Restore machine state stored earlier
● 7. Return to caller and send back z as the return value
Low-Level Perspective

● You can easily see that much of the added complexity is the result of
low-level data management considerations.
● Some common low-level data management constructs include -
– registers,
– stacks,
– and heaps,
Low-Level Perspective

● Registers
● Registers are small chunks of internal memory that reside within the
processor and can be accessed very easily
● While reversing, it is important to try and detect the nature of the
values loaded into each register
● Detecting the case where a register is used simply to allow
instructions access to specific values is very easy because the
register is used only for transferring a value from memory to the
instruction or the other way around
Low-Level Perspective

● Stack
● A stack is an area in program memory that is used for short-term
storage of information by the CPU and the program
● It can be thought of as a secondary storage area for short-term
information
● Registers are used for storing the most immediate data, and the stack
is used for storing slightly longer-term data
● Physically, the stack is just an area in RAM that has been allocated for
this purpose
Low-Level Perspective

● Heaps
● A heap is a managed memory region that allows for the dynamic
allocation of variable-sized blocks of memory in runtime
● A program simply requests a block of a certain size and receives a
pointer to the newly allocated block (assuming that enough memory is
available)
● Heaps are managed either by software libraries that are shipped
alongside programs or by the operating system.
Low-Level Perspective

● Heaps
● For reversers, locating heaps in memory and properly identifying heap
allocation and freeing routines can be helpful, because it contributes
to the overall understanding of the program’s data layout.

You might also like