Analysis of The PE Rich Header and Malware Linking

Running head: PE RICH HEADER AND STATIC MALWARE LINKING 1
Analysis of PE Rich Header and Static Malware Linking
Name
Institution
PE RICH HEADER AND STATIC MALWARE LINKING 2
Introduction
There have been increased cases of cyber-attacks, with the evolution of technology.
(Sood & Enbody, 2015)This brings about great damage to systems in general and makes a
disruption of the normal working of a company, organization or individual’s system. Cardon
Black defines that 86% of cyber-attacks were caused by the occurrence of by malware. This
shows how much malware is dangerous in the world right now and how effective it is in
cyber-attacks. These forms of malware are mostly related to one another and share a method
in which they attack these systems. As reported by AVTest, 37.9% of malware types, which
is the highest are comprised of the Windows portable executables. (Sikorski & Honig, 2016)
Adoption of techniques which use streamlined static malware analysis is done by
malware analysts in order to level the playing field. The work covered in this section involves
the use of the PE Rich Header which is checked for leveraging for static analysis.
Comparison of the samples with the PE-based malware is then done. The information
contained in the header gives a glimpse of the environment in which the executable was built.
The main techniques used in this scenario involve static cryptographic hashing which forms
benchmarks in which comparison can begin. Results are then checked and the data points are
analysed to find which can be used in leveraging in order to perform static detection and PE
samples linking. (Malin, Casey, & Aquilina, 2014)

The Rich Header
According to Webster, the rich header comprises of an undocumented part of the P.E
header (Portable Execution) that results from the computing of the Microsoft-produced
executables building process. The rich header creation is mainly separated into two processes.
(Singh, 2019)
The first process involves the translation of the high-level code to machine-code
which is also referred to as low-level. High-level code involves use of code which is
understandable to humans and uses languages which can be easy to learn and adopt
functionalities, for example, Java. Low-level code or machine code is used by the computer
in identifying what the user has input through the application software.
According to Dean, the second process then involves the combination of the machine
code through a linking process which creates a single executable. This process results in the
creation of the PE Rich Header creation. The PE header is found in the disk and the memory
as shown below, sandwiched between the DOS and the data sections. (Song, 2012)
Figure 1: PE Header Format

The Rich Header is obfuscated and encrypted and contains a decryption key which is
a 32-bit checksum. This key makes a way I which the contents of the section can be
decrypted and confirms the validity of the Rich header. (Marak, 2018)There is a single array
present in the Rich header which is used in storing arrays of the metadata on the steps in the
chain of building during the linking of objects process and is put into a single executable. The
contents of the 8-byte structure which is contained in an array is: the product version, the
product identification and a count which shows how many times each product was used.
There is no way to officially map the Microsoft products PE section but researchers
do a partial mapping which produces the required configurations. This information can be
displayed as:
pID: 134 pV: 18326 pC: 32
This gives all the details of the structure, that is, the product version, the product
identification and a count. The Rich header is concealed and is extracted programmatically
with the body of the section. The decrypted rich header contains the DanS which is the start
of the Rich Header, the checksum and decryption key after the end of the rich header and 12
bytes of padding contained before the rich entry struct. (Wright, 2019) This applies to the
cmd.exe files. The use of Python in decoding and extracting the Rich header enables one to
view the contents stored in the Rich header which are input in a JSON file. A snippet of this
code can be shown as below:
{
"Product_Count": 66,
"Product_Version": 30729,
"Product_ID": 147
},
There are two hashing techniques which are used in the evaluation of the Rich header
utility. The techniques enable the linking of portable executables and static comparison which
looks at the hash values which are equivalent. (Ligh, Case, Levy, & Walters, 2016)The study
utilizes the use of MD5 algorithm but generally any cryptographic hashing function could be
used as long as the same methods are used.
The first technique involves the use of the Rich header’s decoded content as input. The
paper covers this computation as the rich hash. In summary this algorithm can be simplified
into the following processes:
1. Extraction of the Rich Header Section
This begins at the offset 0x80 where the rich header section is extracted
2. Identification of the decryption key
The decryption key is identified which is found at the end of the rich header
3. XOR decryption
There is decryption of the rich header which begins at “Rich” and precedes with a byte
sequence until DanS.
4. Calculation of the MD5 checksum
The MD5 checksum is calculated for each section array that is cleartext. This MD5 checksum
is then deposited at the end of the rich header.
The second technique uses fingerprint creation for the content of the rich header
which makes a resistance which is greater for changes in the build environment of the
executable. The Rich hash comparison is also done. (Zamboni, 2018) The pC is excluded,
which is the most volatile section in the rich header body in order to accomplish it. The
source files referenced by the PE are measured by the pC, for each pV and pID pair. The
hashes of the Rich header are also computed using the Python code. This leads to the display
of the Rich PV hash and the Rich Hash as shown below:
"Rich Hash": "3d75441fa2dca655f337ee83519d34dc",
"Rich PV": "dc083eb68efdb8840ddfaee612a2755d"
The above code is an example of the cmd.exe rich header code.
The identification of the Rich header can be done using many varieties of mechanisms.
However, the use of Yara rules enables one to identify the Rich header content. This helps
greatly in identifying and exploiting the information which is stores in the rich header
section. The Yara documentation describes the properties of the rich header which are
specific. These are:

a) Length
b) Raw_data
c) Offset
d) Version
e) Toolid
f) Clear_data
g) Key
The features described can all be leveraged and checked to ensure that they are leveraged
to create Yara rules which are used to identify the RichPV and Rich matches. Yara rules and
haslib modules can be combined to perform hash computations.
This computation involves the use of the clear_data rich signature and the hashlib
module. For example, Yara rule can identify the PEs which is equivalent to a certain Rich
hash. This can be illustrated below in the following snippet:
rule RichHash_3d75441fa2dca655f337ee83519d34dc
meta:
description ="Matches a Rich Hash of

3d75441fa2dca655f337ee83519d34dc"
condition:
hash.md5(pe.rich_signature.clear_data) ==
"3d75441fa2dca655f337ee83519d34dc"
This Yara rule shown above will determine the PE for the Rich hash of
3d75441fa2dca655f337ee83519d34dc.
Evaluation with Malware Samples
During the experiment, there were 350 malware samples which were evaluated to
understand the utility of the Rich header which helps classify the malware. The classification
covers both targeted forms and opportunistic malware. The classification according to threat
actors included: APT1, ATP28, ATP29, the volatile cedar and the equation group. The
malware families on the other hand are comprised of Cobalt Strike Beacon, Stuxnet,
Carbanak, TurnedUp and Plugx, what is also referred to as Korplug. The samples of malware
were obtained from a combination of reports from advanced threat factors and malware
repositories which are public. (Polychronakis & Meier, 2017)
In order to provide benchmarks for the static comparison used in the evaluation of the
rich header, there are three techniques which are used. These techniques use comparison for
the proposed RichPV and Rich hashes. These techniques are:
1. Ssdeep
This technique is used in identification of similar files. It provides a fuzzy hashing
algorithm which is context triggered. The technique therefore is a cyber security standard in
the comparison of malware samples.
2. Import hash
This is the import address table fingerprint for the PE header. It works by identification of
the similar malware families. Since it contains a large import address table, it can be used to
identify and cluster unique group PE-based malware.
3. Import Fuzzy hash

This technique uses the functionality of the ssdeep and applies the import address of a PE.
This makes it possible to identify the import tables which are similar even though they are not
fully equivalent.
Evaluation of the Rich and RichPV hashes
Once the malware samples are acquired, the static analysis products are created in the
next step. Calculation of each cryptographic hash is done by use of a custom Python script
which takes in takes in PE as input and produces a JSON dictionary which contains the MD5,
impfuzzy, imphash, Rich, ssdeep and RichPV hashes. The script uses the malware samples to
generate the desired hashes for the portable executables. (Tiziana Margaria, 2018) A Neo4j
graph is also constructed to view the relationships among the samples and evaluation of the
linking strength of the static techniques. The static and dynamic feature extraction can be
summarized as below:
Figure 2: PE Files Static and Dynamic Feature Extraction

When the densities of the hashing techniques were graphed, one can view that the
calculations that resulted for the different hashing algorithms. The results of this in
descending order can be summarized as richPv, rich, impfuzzy , impash, ssdeep. The clones
centrality (CLC) is also calculated in order to provide a second metric of evaluation for the
techniques. This metric operates by providing a measure in which a static technique results in
using the shortest path to the nodes in the network. The CLC metric also gives quite the same
result after comparing the results. The descending order of the arrangement is the same.
These two metrics provide a method of demonstrating the RichPV and Rich techniques used
in classifying and linking samples.
When graphed using the density function, the graph obtained was:
When the CLC metric is used:

Strengths and Weaknesses
The research on static malware linking has various advantages on how to analyse malware.
These strengths include:
1. The Detail of Structure
In assessing the PE Rich header, there is a lot of detail in what the structure of the rich
header entails and where it is found. The details are provided with great accuracy, especially
the location of the start and end of the rich header and the MD5 checksum. This helps to
understand how the static linking of malware is done.
2. Use of Python Code in Determining the Rich PV hash
The use of Python in determining the contents of the Rich PV has is one of the strengths
of the article since there is no official way of determining the contents of the Rich PV. In
summary, the use of Python in decoding and extracting the Rich header enables one to view
the contents stored in the Rich header which are input in a JSON file.
3. Use of Different Hashing Algorithms
There are different hashing algorithms which are used in this article. These hashing
algorithms include: impfuzzy , impash and ssdeep. These algorithms to provide benchmarks
for the static comparison used in the evaluation of the rich header. Each type of hashing
algorithm is used for different purposes and functions.
4. The Graphing of the Density Function and CLC Metric
The graphs for both the density function and the CLC Metric provide a visual appeal in
which one can do static comparison of data very easily. Both metrics operate by providing a
measure in which a static technique results in using the shortest path to the nodes in the
network.
Weaknesses
1. Complexity of Explanation
In reading the article, there were some new documentation facts which were very
complex to understand and I had to get some reference from external sources in order to
understand them. This includes the sections on the Yara Rules and Properties which took
some time to understand. Apart from that, the document was fairly simple with the right
explanations for the parts and enough visual aid, that is, pictures which helped in
understanding the article better.
Comparison with External Sources
The article covers the structure of the PE header and identifies the Rich section by use of
static linking. In an effort to determine whether analysis of the rich header was better done
using static linking or dynamic linking, I ended up doing some research on dynamic linking
to compare the two. The article I used was called “Malware Dynamic Analysis” by Veronica
Kovah. (Kovah, 2019)
Malware Dynamic Analysis involves the use of the PE file header and unlike the static link
which combines the .lib files to the PE file, the dynamic link loads the .dll files into the
process memory space, one at a time. The needed functions in the dlls could then just be
called by the main executable.
The main difference in dynamic malware analysis is that the code is actually run in a
monitored environment where the malware sample is checked to prevent it from spreading
and infecting other systems. The malware can be checked for functionality which makes it
hard to miss important behaviours.
The main differences I could gather from analysing and comparing these two types of
malware analysis are:
1. Definition and Working
The static analysis analyses the malware binary code by checking the signatures without
running the code whereas the dynamic malware analysis runs the malware in a monitored
environment and checked.
2. Approach used
The static malware analysis uses signature-based approach in order to do analysis whereas
the dynamic malware analysis uses a behaviour-based approach to analyse and also detect
malware.
3. Processes
Static malware analysis involves virus scanning, packer detection, file fingerprinting, file
obfuscations and analysis of memory artefacts. Dynamic malware analysis on the other hand
uses API calls, memory writes, network and system calls, registry changes and instruction
traces.
4. Action against Sophisticated Malware Programs
Since the use of static malware analysis involves analysing the signature of the malware code
or program, it becomes hard to analyse malware codes that are sophisticated. The use of
dynamic malware analysis helps a lot with sophisticated malware since the analysis can be
done while executing it.

With this, it is visible that dynamic malware analysis is superior in functionality especially
when dealing with sophisticated malware. However, one cannot take away the importance of
static malware analysis and how it identifies signatures for malware without necessarily
running the file.
Conclusion
The paper covers the structure of the PE header and the Rich section of the PE header. In this
sections, we are able to identify the parts which are attacked by malware and helps malware
analysts to be able to detect the malware in the Rich header. The structure of the PE header
also helps to determine where the Rich header stops, starts and the location of other variables
such as the checksum.
When analysing the techniques used in identification of the RichPV hashes and Rich hashes,
one is able to identify what the techniques used in the evaluation of hashes such as the
impfuzzy, impash and ssdeep. The graphing to determine the best techniques to leverage the
PE Rich header is done in two ways: using the density of the techniques and using the CLC
metric. Both produce the same result in ranking the techniques as shown with the graphs
above.
A recommendation of this method of this analysis would be to switch from static to dynamic
feature extraction. The dynamic extraction uses API calls and a dynamic feature vector which
I view would be faster than using the PE header and strings.

REFERENCES
Kovah, V. (2019). Malware Dynamic Analysis. Retrieved from Open Security Training:
http://opensecuritytraining.info/MalwareDynamicAnalysis.html
Ligh, M. H., Case, A., Levy, J., & Walters, A. (2016). The Art of Memory Forensics:
Detecting Malware and Threats in Windows, Linux, and Mac Memory. John Wiley &
Sons.
Malin, C. H., Casey, E., & Aquilina, J. M. (2014). Malware Forensics: Investigating and
Analyzing Malicious Code. Syngress.
Marak, V. (2018). Windows Malware Analysis Essentials. Packt Publishing.
Polychronakis, M., & Meier, M. (2017). Detection of Intrusions and Malware, and
Vulnerability Assessment. Springer.
Sikorski, M., & Honig, A. (2016). Practical Malware Analysis: The Hands-On Guide to
Dissecting Malicious Software. No Starch Press.
Singh, A. (2019). Identifying Malicious Code Through Reverse Engineering. Springer
Science & Business Media.
Song, H. Y. (2012). Automatic Malware Analysis: An Emulator Based Approach. Springer
Science & Business Media.
Sood, A., & Enbody, R. (2015). Targeted Cyber Attacks: Multi-staged Attacks Driven by
Exploits and Malware. Elsevier Science.
Tiziana Margaria, B. S. (2018). Leveraging Applications of Formal Methods, Verification
and Validation Modeling. Springer.
Wright, D. (2019). A Malware Analysis and Artifact Capture Tool. Dakota State University.
Zamboni, D. (2018). Detection of Intrusions and Malware, and Vulnerability Assessment: 5th
International Conference Proceedings. Springer Science & Business Media.

Analysis of The PE Rich Header and Malware Linking

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis of The PE Rich Header and Malware Linking

Uploaded by

Copyright:

Available Formats

Running head: PE RICH HEADER AND STATIC MALWARE LINKING 1

Analysis of PE Rich Header and Static Malware Linking

disruption of the normal working of a company, organization or individual’s system. Cardon

Adoption of techniques which use streamlined static malware analysis is done by

samples linking. (Malin, Casey, & Aquilina, 2014)

The Rich Header

Figure 1: PE Header Format

pID: 134 pV: 18326 pC: 32

code can be shown as below:

used as long as the same methods are used.

into the following processes:

1. Extraction of the Rich Header Section

2. Identification of the decryption key

sequence until DanS.

4. Calculation of the MD5 checksum

is then deposited at the end of the rich header.

of the Rich PV hash and the Rich Hash as shown below:

"Rich Hash": "3d75441fa2dca655f337ee83519d34dc",

"Rich PV": "dc083eb68efdb8840ddfaee612a2755d"

The above code is an example of the cmd.exe rich header code.

specific. These are:

haslib modules can be combined to perform hash computations.

hash. This can be illustrated below in the following snippet:

description ="Matches a Rich Hash of

Evaluation with Malware Samples

repositories which are public. (Polychronakis & Meier, 2017)

the proposed RichPV and Rich hashes. These techniques are:

This technique is used in identification of similar files. It provides a fuzzy hashing

the comparison of malware samples.

identify and cluster unique group PE-based malware.

3. Import Fuzzy hash

Evaluation of the Rich and RichPV hashes

Figure 2: PE Files Static and Dynamic Feature Extraction

in classifying and linking samples.

When the CLC metric is used:

Strengths and Weaknesses

These strengths include:

1. The Detail of Structure

understand how the static linking of malware is done.

2. Use of Python Code in Determining the Rich PV hash

3. Use of Different Hashing Algorithms

algorithm is used for different purposes and functions.

4. The Graphing of the Density Function and CLC Metric

understanding the article better.

Comparison with External Sources

Kovah. (Kovah, 2019)

called by the main executable.

hard to miss important behaviours.

malware analysis are:

1. Definition and Working

environment and checked.

4. Action against Sophisticated Malware Programs

done while executing it.

running the file.

such as the checksum.

I view would be faster than using the PE header and strings.

Analyzing Malicious Code. Syngress.

Marak, V. (2018). Windows Malware Analysis Essentials. Packt Publishing.

Vulnerability Assessment. Springer.

Dissecting Malicious Software. No Starch Press.

Singh, A. (2019). Identifying Malicious Code Through Reverse Engineering. Springer

Science & Business Media.