Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

University of Batna 2 Course: Cyber Security 2

Faculty of Mathematics and Computer Science Master 2 ISIDS Year 2023/2024


Computer Science Department

Chapter 3 : Malware analysis and reverse engineering

I. Preamble :

An employee of the company where you work contacts you to tell you that his machine has been
infected. You run a forensic analysis on the infected machine and detect the presence of a Trojan
malware on the machine. You clean the machine with antivirus and create an IDS rule to ensure that no
other machine has been infected. You also put the infected machine under monitoring.

A few days later, you see in the SIEM logs that the machine is communicating with malicious IPs and
sending data to these IPs. You run a search on other machines on the network and discover that other
machines are behaving in the same way as the infected machine.

You realize that your antivirus is not providing the desired protection and that your IDS rule has failed to
detect the malware.

Your superiors call you to find out what's going on, and you ask yourself the following questions:

 How can you determine what exactly this malware is doing ?


 How does it propagate or spread?
 How does it persist on the infected system?
 What data does it exfiltrate or modify?
 Are there any vulnerabilities or weaknesses in the malware that can be exploited for detection or
removal?
 How could you create more effective network signatures on your tools?
 How can you be sure that you have properly cleaned and removed the malware and its
derivatives?

To answer all these questions, you have to carry out a malware analysis using the Reverse Engineering
(RE) technique.

In this chapter, we are not going to focus on how to find the malware (forensic analysis), but on how to
analyze it once it has been found.

As in the previous chapter, we will be concentrating on the analysis of malware(s) detected on Windows
machines.

II. Malware analysis and reverse engineering:

Malware analysis and reverse engineering are closely related but not exactly the same. Malware analysis
is a broader term that encompasses various techniques and methodologies used to examine and
understand malicious software (malware). On the other hand, reverse engineering is one of the key
techniques employed in malware analysis.

Nesrine KHERNANE Page 1


University of Batna 2 Course: Cyber Security 2
Faculty of Mathematics and Computer Science Master 2 ISIDS Year 2023/2024
Computer Science Department

Malware analysis involves studying the behavior, characteristics, and functionality of malicious software
to identify its purpose, capabilities, and potential impact. It aims to uncover the inner workings of
malware, understand its infection vectors, communication protocols, payload delivery mechanisms,
evasion techniques and encryption or obfuscation techniques. The analysis helps in developing effective
detection signatures, creating mitigation strategies, and improving overall cybersecurity defenses. This
involves the creation of both host-based and network-based signatures.

Reverse engineering plays a crucial role in malware analysis. It involves the systematic and detailed
examination of the malware's code, structure, capabilities, techniques employed, and behavior to
understand its functionality and design. Reverse engineering techniques, such as disassembly, debugging,
and code analysis, are used to deconstruct the malware and extract valuable information, such as its logic,
algorithms, and communication mechanisms.

It is a crucial technique against ever-evolving and sophisticated malware threats.

It's important to note that reverse engineering can have both legitimate and illegitimate uses. Legitimate
uses include software maintenance, interoperability, and security analysis. Illegitimate uses, such as
software piracy or unauthorized modification, are generally considered unethical.

Reverse engineering is not limited to malware analysis alone. It can be used for analyzing any software or
system, including legitimate software, to understand its inner workings or to make modifications,
understanding proprietary protocols, analyzing software vulnerabilities, interoperability testing, or
creating compatible alternatives.

Overall, reverse engineering is a powerful technique that allows us to explore, understand, and leverage
existing systems, contributing to innovation, security, and advancements in technology.

III. Techniques used for reverse engineering:

Most of the time, when you're doing RE, you will be doing it on an executable file (.exe). This file is not
humanly readable, so you be obliged to use several tools. Each tool used during the reverse engineering
enables you to extract some information about the malware, and it is the use of all the tools that will
enable you to understand the malware process.

We have two approaches used in RE for malware analysis: static analysis and dynamic analysis.

1. The static analysis:

Static analysis is indeed one of a key component of reverse engineering. It involves examining the code
and structure of a program or malware without executing it, and generally used to determine whether or
not an executable file is malicious. It involves loading the executable file into a disassembler and analyzing
its instructions to find out what the program does.

This technique requires some knowledge of disassembly, how code is constructed and a some concepts
about the operating system.

Nesrine KHERNANE Page 2


University of Batna 2 Course: Cyber Security 2
Faculty of Mathematics and Computer Science Master 2 ISIDS Year 2023/2024
Computer Science Department

2. The Dynamic analysis:

This is a technique used to analyze a file by executing it, in order to observe its behavior on the machine
and to create effective security rules. It involves monitoring malware's interaction with the system, its
network communications, file modifications, registry changes, process manipulation, potential damage it
can cause and other activities. This technique uses a debugger to examine the program.

With this technique, you need to prepare a safe and controlled environment before running the
executable file to avoid infecting your proper machine.

Both techniques are used during the reverse engineering to obtain a complete analysis of the suspect
executable file.

IV. Basic static analysis techniques:

Static analysis techniques are employed to gain insights into the functionality, behavior, and potential
vulnerabilities of the software being analyzed. During reverse engineering, static analysis can be
performed on the binary executable, source code, or intermediate representations of a program. This
analysis helps in understanding the program's logic, identifying potential security vulnerabilities, and
uncovering hidden functionality.

In this part of the course, we'll look at what relevant information we can get from a program without
actually executing it.

Here are some common static analysis techniques and tools used in reverse engineering:

1. Anti-virus:

The first step in malware analysis is to have the program scanned by several AVs to identify known threats
and suspicious behavior. This can quickly determine if the sample being analyzed is already known and
thus saving time and efforts in further analysis.

However, AVs are based on a database of known malware, on the malware's behavior or on pattern-
matching (i.e. the presence of a character string specific to a type of malware). Attackers often change
their code, which changes the malware signature and thus bypasses AVs.

Behavior-based detection and pattern-matching based filtering (known as heuristic detection) are more
effective in detecting unknown malware. However, they can also be overridden by new malware.

One of the most widely used sites for scanning one program across multiple AVs is VirusTotal.

2. The Program hash

A hash is considered as a fingerprint of a file, and every file has its own unique hash.

A hash can be calculated using several tools, such as md5deep, hashCalc, command line, etc.

Hereafter an example using PowerShell:

Nesrine KHERNANE Page 3


University of Batna 2 Course: Cyber Security 2
Faculty of Mathematics and Computer Science Master 2 ISIDS Year 2023/2024
Computer Science Department

Hereafter an example of HashCalc tool:

The hash can be used as an IoC and shared with other analysts to identify malware. Searching for this
hash on the Internet shows whether it has already been identified and if it is a variant of a known
malware family. It is also used to blacklist the known malicious files (based on their hash), enabling
proactive detection and prevention of known threats.

3. Strings searching

A string is a succession of characters in a program, and a program contains strings if it displays a message,
connects to the Internet (IPs, URLs or domains) or if it copies, moves or creates a file in a specific
directory.

However, Strings are stored in either ASCII or Unicode format. ASCII uses 1 byte for each character, while
Unicode format uses 2 bytes. Both encodings (ASCII and Unicode) end the string with a NULL.

Example:

ASCII: 0x53 is the Hex representation of the uppercase letter S in ASCII code.

Unicode Format: 0x44 and 0x00 is the Hex representation of uppercase letter D in Unicode format

Sometimes, running a string search via software can give you meaningless results. In this case, it's up to
the analyst to filter and extract the relevant information.

Nesrine KHERNANE Page 4


University of Batna 2 Course: Cyber Security 2
Faculty of Mathematics and Computer Science Master 2 ISIDS Year 2023/2024
Computer Science Department

Example:

In the above example, we can see that the strings KN12, ky@, $L!, ML+ have no significance and can be
ignored.

The two strings GetLayout and SetLayout are Windows functions and can be considered as two significant
information (for your information, the naming convention for Windows functions always begins with an
uppercase letter and so does the second word).

GDI32.DLL represents the library used by hello.exe.

153.42.13.64 is an IP address that could be used by a malicious program.

"You can use this link safely" is the message displayed.

Examples for some relevant information to be retrieved during the string search (APIs and Librairies):

APIs are standardized interfaces that enable different applications to communicate, while libraries are
collections of precompiled code that provide reusable functionality to extend the capabilities of software.
APIs facilitate communication and interaction, while libraries provide specific functionality for use in a
program.

Attackers often utilize various libraries to aid in their malicious activities. Here are a few examples of
libraries commonly used by attackers:

 Windows API Libraries: Attackers leverage various Windows API libraries to interact with the
operating system and perform malicious actions. These libraries include kernel32.dll,
advapi32.dll, user32.dll, and ws2_32.dll. They provide functions for file manipulation, registry
access, process management, network communication, and more.
 OpenSSL: OpenSSL is a widely used open-source library for secure communication. Attackers may
leverage OpenSSL to implement encryption, decryption, or secure network communication in
their malware. It provides functions for cryptographic operations like RSA, AES, and SSL/TLS
protocols.
 libcurl: libcurl is a popular library for data transfer in various protocols, including HTTP, FTP,
SMTP, and more. Attackers may use libcurl to establish network connections, download/upload

Nesrine KHERNANE Page 5


University of Batna 2 Course: Cyber Security 2
Faculty of Mathematics and Computer Science Master 2 ISIDS Year 2023/2024
Computer Science Department

files, or interact with web services. It provides an easy-to-use API for HTTP requests and supports
features like authentication, cookies, and SSL.
 libpcap: libpcap is a library for capturing and processing network traffic. Attackers may utilize
libpcap to sniff network packets, perform network reconnaissance, or analyze network
communication within their malware. It provides functions for packet capture, filtering, and
analysis.
 SQLite: SQLite is a lightweight, embedded database library. Attackers may use SQLite to store and
manage data within their malware, such as configuration settings, command-and-control (C2)
server information, or stolen data. SQLite provides an efficient and easy-to-use database solution.
 Crypto++: Crypto++ is a well-known C++ library for cryptographic operations. Attackers may
employ Crypto++ to implement encryption, decryption, or hashing algorithms within their
malware. It provides a wide range of cryptographic functions and algorithms, making it suitable
for various malicious purposes.

These are just a few examples of libraries that attackers may utilize in their malicious code. By identifying
the presence of these libraries within the malware code, analysts can gain insights into the capabilities,
encryption methods, network communication, and data storage mechanisms employed by the malware.

Examples of DLLs and what they can tell about a program:

Additionally, attackers often utilize various APIs (Application Programming Interfaces) to interact with the
operating system or perform malicious activities. Here are a few examples of APIs commonly used by
attackers:

 Winsock API: The Windows Sockets API (Winsock) is used for network communication. Attackers
may utilize Winsock API functions, such as socket(), connect(), or send(), to establish network
connections, send/receive data, or conduct command-and-control (C2) communication.

Nesrine KHERNANE Page 6


University of Batna 2 Course: Cyber Security 2
Faculty of Mathematics and Computer Science Master 2 ISIDS Year 2023/2024
Computer Science Department

 Windows Registry API: The Windows Registry API allows manipulation of the system's registry
database. Attackers may leverage functions like RegOpenKeyEx(), RegSetValueEx(), or
RegDeleteKey() to modify registry entries, including creating persistence mechanisms, altering
system settings, or hiding their presence.
 Windows Management Instrumentation (WMI) API: WMI API provides a powerful interface to
manage system components and gather information. Attackers may utilize WMI functions, such
as ExecQuery() or Create() methods, to execute malicious scripts, perform reconnaissance, or
execute remote commands on compromised systems.
 Windows API for Process Manipulation: Attackers often use Windows API functions like
CreateProcess(), OpenProcess(), or InjectLibrary() to create or manipulate processes. These
functions enable activities such as process injection, code execution, or privilege escalation.
 Cryptography APIs: Attackers may employ cryptography APIs, such as CryptEncrypt() or
CryptDecrypt(), to encrypt or decrypt data during data exfiltration or to obfuscate their
communication channels.
 Windows API for File Manipulation: Attackers may use Windows API functions like CreateFile(),
ReadFile(), or WriteFile() to manipulate files on the compromised system. These functions enable
activities such as file encryption, data theft, or file system modifications.

These are just a few examples of the many APIs that attackers can exploit for malicious purposes. By
understanding the APIs commonly used by attackers, analysts can identify suspicious API calls within the
malware code and gain insights into the type of activities the malware is capable of performing.

4. Malware obfuscation and packing

Attackers can hide their malicious code by obfuscation or packing or even by combining both techniques.

Frequently, legitimate programs contain many readable strings, which is not the case for obfuscated or
packed programs. These latter often make use of the LoadLibrary and GetProcAddress functions to access
other functionalities in any library on the system.

When discussing the difference between obfuscation and packing of a malicious code, it's important to
understand that encryption protocols can be used in both techniques, but they serve different purposes
within each approach.

Obfuscation: Obfuscation refers to the intentional modification of the code or program structure to make
it more difficult to understand or analyze. The main goal of obfuscation is to hinder reverse engineering
efforts and protect the intellectual property of the malicious code. It involves transforming the code while
preserving its functionality.

Encryption protocols can be used as part of obfuscation techniques to encrypt certain sections of the
code, making it difficult for analysts to comprehend the logic or purpose of those encrypted sections.
However, encryption is just one of the many obfuscation methods that can be employed. Other
obfuscation techniques include renaming variables, adding unnecessary code or complexity, rearranging
code blocks, and more.

Nesrine KHERNANE Page 7


University of Batna 2 Course: Cyber Security 2
Faculty of Mathematics and Computer Science Master 2 ISIDS Year 2023/2024
Computer Science Department

If not using any encryption, the obfuscated code does not require any steps for decompression or
decryption before execution. The obfuscation techniques applied to the code aim to make it more difficult
to understand or analyze, but they do not alter the executable format or require any runtime unpacking.
The obfuscated code is still in a directly executable form, and the execution process remains the same as
for any other code.

Example of a tools enabling the deobfuscation: de4dot. JEB Decompiler (for Android application), IDA Pro.

Packing: Packing, on the other hand, primarily focuses on compressing or encrypting the entire
executable file of a program or malware. The purpose of packing is to reduce the file size and make it
more challenging to analyze or detect. Packing involves using specialized tools or software packers to
compress or encrypt the executable file, resulting in a packed or compressed version that make it harder
to analyze or reverse engineer.

Encryption protocols may be utilized during the packing process to encrypt the entire packed file, making
it harder for security tools to identify/detect the malicious code or analyze its content.

The packed file needs to be unpacked and/or decrypted in memory. This unpacking process, also known
as runtime unpacking, occurs dynamically during runtime and is typically performed by a built-in
unpacking routine within the packed code or by a separate unpacking stub that accompanies the packed
file. Once unpacked, the code is then executed in its original form.

An example of a tool enabling both the packing and the unpacking process is UPX, which can be
downloaded from http:// upx.sourceforge.net/ (always install the latest version).

In summary, while both obfuscation and packing may involve the use of encryption protocols, the main
distinction lies in their objectives. Obfuscation focuses on modifying the code or program structure to
hinder analysis, while packing primarily aims at compressing or encrypting the entire executable file to
evade detection.

5. Portable Executable File Format (PE)

The programmer writes his source code in a defined programming language (e.g. C, C++, Java, .NET, etc.)
and then passes it to a compiler to create the executable code (machine code). During the "source
code/machine code" conversion phase, the compiler creates a container called a PE file, which is the file
format used by Windows OS for the extensions .exe, .dll, .cpl, .src, .sys, .ocx, .drv, .efi, .fon.

This file format can reveal a lot about a program's functionality. The portable executable (PE) format is
used by Windows executables, object code and DLLs.

The PE contains the information required by the operating system to initiate program execution. The PE
has a header that contains highly pertinent information on the code, application type, libraries and the
memory space required to run the program. There are several PE file parsers that can be used to retrieve
these information, such as pestudio and PEviewer.

The libraries used and the functions called are often the most important parts of a program, and
identifying them is particularly important, as it allows us to guess what the program is doing. For example,

Nesrine KHERNANE Page 8


University of Batna 2 Course: Cyber Security 2
Faculty of Mathematics and Computer Science Master 2 ISIDS Year 2023/2024
Computer Science Department

if a program imports the URLDownloadToFile() function, we can assume that it is connecting to the
Internet to download a content, which it then stores in a local file.

Most files containing executable code and loaded by Windows are in PE format.

View of the beginning of a PE file:

MZ-DOS header: enables to recognize if a file is launched from MS-DOS

DOS segment: issues an error message if the file's PE format is not recognized

PE header: is a set of structures enabling the extraction of information about the file in question, the
version of the operating system targeted by this file, etc.

Table of Sections:

 .text (code): this is the program instructions that the processor will execute (what the .exe asks to
do). this is the only part that contains code.
 .data: contains initialized variables accessed by the program.
o In some cases, we may also find other sections such as ".rdata", . "idata" or ".edata"
which contain import and/or export information and may also contain read-only data.
 .rsrc (Resources): the resources used by the .exe file (e.g. sound, icons, strings, menus, images,
etc.). Strings are often stored in the .rsrc section for multilanguage support.

Nesrine KHERNANE Page 9


University of Batna 2 Course: Cyber Security 2
Faculty of Mathematics and Computer Science Master 2 ISIDS Year 2023/2024
Computer Science Department

V. Basic dynamic analysis techniques::

Dynamic analysis is an analysis that takes place during and after malware execution, and therefore the
latter is performed after a static analysis.

Note that, static analysis can be limited when we come across an obfuscated or packed program that we
are unable to successfully deobfuscate or unpack it. In such cases, dynamic analysis not only allows us to
monitor the malware during execution, but also to examine the system after it has been executed.

However, dynamic analysis can potentially represent a security risk to your machine and your network,
and therefore you need to carefully prepare your environment before proceeding with this phase.

There are several tools you can use to achieve this phase.

1. Using a SandBox:

A SandBox is an isolated and security-based environment for testing suspicious programs without
exposing your machine to risk.

There are several open-source SandBoxes available online for analyzing suspicious programs, such as: Joe
SandBox, Hybrid analysis, Norman SandBox, GFI SandBox, ThreatExpert, etc.

There are also payable SandBoxes which are generally used in companies such as Cuckoo SandBox.

Most SandBoxes work in the same way, and the final report generated by SandBoxes contains a multitude
of relevant information, such as malware network activity, files created/deleted/modified, VirusTotal
report, IoCs, etc.

Example of a malware classification after execution on the Joe SandBox:

Nesrine KHERNANE Page 10


University of Batna 2 Course: Cyber Security 2
Faculty of Mathematics and Computer Science Master 2 ISIDS Year 2023/2024
Computer Science Department

However, dynamic analysis using a SandBox also has a set of disadvantages:

 The SandBox executes the malware without command-line options.


 If the malware has to wait for a command from a C&C server before setting up a backdoor, this
will not be visible on the SandBox output.
 If malware has been programmed to be launched only after a certain time (e.g. sleep (72h)), the
SandBox can change this value by reducing it, but there are other methods of launching malware
after a certain time that the SandBox is unaware of.
 Some malware can detect that it has been launched on a virtual machine, in this case the
malware may even change its behavior or refuse to be executed..

2. Malware execution:

Most of the malware you will encounter are either an executable files or a DLLs. Launching an executable
file is easy: either double-click on the file or launch it from a terminal. On the other hand, it's not so easy
to run DLLs (Windows doesn't know how to run them).

Hereafter the command that allows the execution of a DLL:

C:\>rundll32.exe <DLLname>, <Export>

 rundll32.exe is included in all Windows operating systems.


 DLLname: is the name of the DLL to be launched.
 Export: is the name of the function to be called.

Example: the rip.dll file has the following two exports: Install and Uninstall

Our final command becomes:

C:\>rundll32.exe rip.dll, Install

3. Monitoring using Procmon

Procmon is an advanced monitoring tool for Windows operating systems, enabling you to monitor
registries, system files, the network, processes and thread activities. It has also a filter functionality to
make it easier to find a targeted information (e.g. RegSetValue, CreateFile, WriteFile, or other suspicious
actions/calls).

However, this tool consumes a lot of memory, as it records all events on your machine even when using
the filtering option.

Although this tool can retrieve a lot of information, it cannot capture everything (example:
SetWindowsHookEx) and does not work efficiently for collecting network activity logs.

In the example below, on the line selected in black, we can see that the executable mm32.exe has created
a file mw2mmgr.txt (with the file path) and that the operation has been carried out successfully.

Nesrine KHERNANE Page 11


University of Batna 2 Course: Cyber Security 2
Faculty of Mathematics and Computer Science Master 2 ISIDS Year 2023/2024
Computer Science Department

Using this tool allows you to:


 By examining registries you can tell how the malware was able to install its program on the
registry.
 By examining system files, you can extract any files the malware has created or the configurations
it uses.
 By investigating process activity (.exe) you can find out whether the malware has created other
processes.
 By identifying network connections, you can extract the open ports on which the malware is
listening.

4. Process Explorer tool:

Microsoft's Process Explorer tool is a powerful tool that should be launched at the time of dynamic
analysis. It monitors the processes that are currently running on a system and displays them with
their parent/child relationships.

In the example below, we can see that the services.exe process is a child of the winlogon.exe process.

During the dynamic analysis, you need to monitor the new and the changing processes. Double-clicking on
a process will open a window (see example below). From this window we can extract relevant
information. For example:

 Threads: shows all active threads.


 TCP/IP: displays active connections or listening ports.
 Image: displays the path to the executable file on the hard disk.

Nesrine KHERNANE Page 12


University of Batna 2 Course: Cyber Security 2
Faculty of Mathematics and Computer Science Master 2 ISIDS Year 2023/2024
Computer Science Department

The Verify button lets you check the file signature. Taking the example above, svchost.exe is a Windows
process. However, an attacker could replace a legitimate process on disk with his own malicious code. By
checking the signature with "Process Explorer" you can be sure that the program used is legitimate if the
signature is valid.

However, the attackers can use another method to trick you even if you use the Verify button of "Process
Explorer". This method, known as "process replacement", consists of executing a process on the system
and overwriting its memory space with a malicious executable enabling it to have the same privileges as
the legitimate process it replaces, and to be run as a legitimate process.

One solution to recognize a process replacement is to use the Strings Tab on the "Process Explorer" tool
to compare the strings of the process in memory and its image on disk. If the difference is huge, it means
that it's not the same program that's running. Here's an example:

Nesrine KHERNANE Page 13


University of Batna 2 Course: Cyber Security 2
Faculty of Mathematics and Computer Science Master 2 ISIDS Year 2023/2024
Computer Science Department

This tool can be used to analyze malicious files (.pdf, .docs, etc) by first running "Process Explorer" and
then the suspect file. If the suspicious file launches other processes, you can view it on the tool and
determine its location.

5. Using Wireshark:

Wireshark is an open source tool for eavesdropping the network (Sniffing). It is used for its efficiency in
capturing packets and analyzing each packet individually. This tool can also be used by attackers to sniff
out passwords, steal sensitive information and eavesdrop on online discussions on a local network.

Wireshark can help you to understand how the malwares communicate over the network by sniffing
packets while they are communicating. To use Wireshark for this purpose, connect to the Internet, launch
Wireshark packet capture and then execute the malware.

VI. Conclusion:

Note that different tools and approaches are available for different jobs. There is no one-size-fits-all
approach. Every situation is different, and the various tools and techniques you learn will have similar and
sometimes overlapping functionality. If you have no luck with one tool, try another. If you get stuck, don't
spend much time on one problem, just move on to something else. Try analyzing the malware from a
different angle, or simply adopt a different approach.

Finally, don't forget that malware analysis is like a game. As new malware analysis techniques are
developed, attackers respond with new techniques to bypass the analysis. To succeed as a malware
analyst, you need to be able to recognize, understand and thwart these techniques, and react to changes
in malware analysis.

In this chapter, we've looked at basic static analysis and basic dynamic analysis. To go further, there are
other more advanced methods and tools. In this case, we are talking about advanced static analysis and
advanced dynamic analysis.

Nesrine KHERNANE Page 14

You might also like