Professional Documents
Culture Documents
Analysis of The PE Rich Header and Malware Linking
Analysis of The PE Rich Header and Malware Linking
Name
Institution
PE RICH HEADER AND STATIC MALWARE LINKING 2
Introduction
There have been increased cases of cyber-attacks, with the evolution of technology.
(Sood & Enbody, 2015)This brings about great damage to systems in general and makes a
Black defines that 86% of cyber-attacks were caused by the occurrence of by malware. This
shows how much malware is dangerous in the world right now and how effective it is in
cyber-attacks. These forms of malware are mostly related to one another and share a method
in which they attack these systems. As reported by AVTest, 37.9% of malware types, which
is the highest are comprised of the Windows portable executables. (Sikorski & Honig, 2016)
malware analysts in order to level the playing field. The work covered in this section involves
the use of the PE Rich Header which is checked for leveraging for static analysis.
Comparison of the samples with the PE-based malware is then done. The information
contained in the header gives a glimpse of the environment in which the executable was built.
The main techniques used in this scenario involve static cryptographic hashing which forms
benchmarks in which comparison can begin. Results are then checked and the data points are
analysed to find which can be used in leveraging in order to perform static detection and PE
According to Webster, the rich header comprises of an undocumented part of the P.E
header (Portable Execution) that results from the computing of the Microsoft-produced
executables building process. The rich header creation is mainly separated into two processes.
(Singh, 2019)
The first process involves the translation of the high-level code to machine-code
which is also referred to as low-level. High-level code involves use of code which is
understandable to humans and uses languages which can be easy to learn and adopt
functionalities, for example, Java. Low-level code or machine code is used by the computer
in identifying what the user has input through the application software.
According to Dean, the second process then involves the combination of the machine
code through a linking process which creates a single executable. This process results in the
creation of the PE Rich Header creation. The PE header is found in the disk and the memory
as shown below, sandwiched between the DOS and the data sections. (Song, 2012)
PE RICH HEADER AND STATIC MALWARE LINKING 4
a 32-bit checksum. This key makes a way I which the contents of the section can be
decrypted and confirms the validity of the Rich header. (Marak, 2018)There is a single array
present in the Rich header which is used in storing arrays of the metadata on the steps in the
chain of building during the linking of objects process and is put into a single executable. The
contents of the 8-byte structure which is contained in an array is: the product version, the
product identification and a count which shows how many times each product was used.
There is no way to officially map the Microsoft products PE section but researchers
do a partial mapping which produces the required configurations. This information can be
displayed as:
This gives all the details of the structure, that is, the product version, the product
identification and a count. The Rich header is concealed and is extracted programmatically
PE RICH HEADER AND STATIC MALWARE LINKING 5
with the body of the section. The decrypted rich header contains the DanS which is the start
of the Rich Header, the checksum and decryption key after the end of the rich header and 12
bytes of padding contained before the rich entry struct. (Wright, 2019) This applies to the
cmd.exe files. The use of Python in decoding and extracting the Rich header enables one to
view the contents stored in the Rich header which are input in a JSON file. A snippet of this
{
"Product_Count": 66,
"Product_Version": 30729,
"Product_ID": 147
},
There are two hashing techniques which are used in the evaluation of the Rich header
utility. The techniques enable the linking of portable executables and static comparison which
looks at the hash values which are equivalent. (Ligh, Case, Levy, & Walters, 2016)The study
utilizes the use of MD5 algorithm but generally any cryptographic hashing function could be
The first technique involves the use of the Rich header’s decoded content as input. The
paper covers this computation as the rich hash. In summary this algorithm can be simplified
This begins at the offset 0x80 where the rich header section is extracted
The decryption key is identified which is found at the end of the rich header
PE RICH HEADER AND STATIC MALWARE LINKING 6
3. XOR decryption
There is decryption of the rich header which begins at “Rich” and precedes with a byte
The MD5 checksum is calculated for each section array that is cleartext. This MD5 checksum
The second technique uses fingerprint creation for the content of the rich header
which makes a resistance which is greater for changes in the build environment of the
executable. The Rich hash comparison is also done. (Zamboni, 2018) The pC is excluded,
which is the most volatile section in the rich header body in order to accomplish it. The
source files referenced by the PE are measured by the pC, for each pV and pID pair. The
hashes of the Rich header are also computed using the Python code. This leads to the display
The identification of the Rich header can be done using many varieties of mechanisms.
However, the use of Yara rules enables one to identify the Rich header content. This helps
greatly in identifying and exploiting the information which is stores in the rich header
section. The Yara documentation describes the properties of the rich header which are
a) Length
b) Raw_data
c) Offset
d) Version
e) Toolid
f) Clear_data
g) Key
The features described can all be leveraged and checked to ensure that they are leveraged
to create Yara rules which are used to identify the RichPV and Rich matches. Yara rules and
This computation involves the use of the clear_data rich signature and the hashlib
module. For example, Yara rule can identify the PEs which is equivalent to a certain Rich
rule RichHash_3d75441fa2dca655f337ee83519d34dc
meta:
condition:
hash.md5(pe.rich_signature.clear_data) ==
"3d75441fa2dca655f337ee83519d34dc"
This Yara rule shown above will determine the PE for the Rich hash of
3d75441fa2dca655f337ee83519d34dc.
PE RICH HEADER AND STATIC MALWARE LINKING 8
During the experiment, there were 350 malware samples which were evaluated to
understand the utility of the Rich header which helps classify the malware. The classification
covers both targeted forms and opportunistic malware. The classification according to threat
actors included: APT1, ATP28, ATP29, the volatile cedar and the equation group. The
malware families on the other hand are comprised of Cobalt Strike Beacon, Stuxnet,
Carbanak, TurnedUp and Plugx, what is also referred to as Korplug. The samples of malware
were obtained from a combination of reports from advanced threat factors and malware
In order to provide benchmarks for the static comparison used in the evaluation of the
rich header, there are three techniques which are used. These techniques use comparison for
1. Ssdeep
algorithm which is context triggered. The technique therefore is a cyber security standard in
2. Import hash
This is the import address table fingerprint for the PE header. It works by identification of
the similar malware families. Since it contains a large import address table, it can be used to
This technique uses the functionality of the ssdeep and applies the import address of a PE.
This makes it possible to identify the import tables which are similar even though they are not
fully equivalent.
Once the malware samples are acquired, the static analysis products are created in the
next step. Calculation of each cryptographic hash is done by use of a custom Python script
which takes in takes in PE as input and produces a JSON dictionary which contains the MD5,
impfuzzy, imphash, Rich, ssdeep and RichPV hashes. The script uses the malware samples to
generate the desired hashes for the portable executables. (Tiziana Margaria, 2018) A Neo4j
graph is also constructed to view the relationships among the samples and evaluation of the
linking strength of the static techniques. The static and dynamic feature extraction can be
summarized as below:
When the densities of the hashing techniques were graphed, one can view that the
calculations that resulted for the different hashing algorithms. The results of this in
descending order can be summarized as richPv, rich, impfuzzy , impash, ssdeep. The clones
centrality (CLC) is also calculated in order to provide a second metric of evaluation for the
techniques. This metric operates by providing a measure in which a static technique results in
using the shortest path to the nodes in the network. The CLC metric also gives quite the same
result after comparing the results. The descending order of the arrangement is the same.
These two metrics provide a method of demonstrating the RichPV and Rich techniques used
When graphed using the density function, the graph obtained was:
PE RICH HEADER AND STATIC MALWARE LINKING 11
The research on static malware linking has various advantages on how to analyse malware.
In assessing the PE Rich header, there is a lot of detail in what the structure of the rich
header entails and where it is found. The details are provided with great accuracy, especially
the location of the start and end of the rich header and the MD5 checksum. This helps to
The use of Python in determining the contents of the Rich PV has is one of the strengths
of the article since there is no official way of determining the contents of the Rich PV. In
summary, the use of Python in decoding and extracting the Rich header enables one to view
the contents stored in the Rich header which are input in a JSON file.
There are different hashing algorithms which are used in this article. These hashing
algorithms include: impfuzzy , impash and ssdeep. These algorithms to provide benchmarks
for the static comparison used in the evaluation of the rich header. Each type of hashing
The graphs for both the density function and the CLC Metric provide a visual appeal in
which one can do static comparison of data very easily. Both metrics operate by providing a
PE RICH HEADER AND STATIC MALWARE LINKING 13
measure in which a static technique results in using the shortest path to the nodes in the
network.
Weaknesses
1. Complexity of Explanation
In reading the article, there were some new documentation facts which were very
complex to understand and I had to get some reference from external sources in order to
understand them. This includes the sections on the Yara Rules and Properties which took
some time to understand. Apart from that, the document was fairly simple with the right
explanations for the parts and enough visual aid, that is, pictures which helped in
The article covers the structure of the PE header and identifies the Rich section by use of
static linking. In an effort to determine whether analysis of the rich header was better done
using static linking or dynamic linking, I ended up doing some research on dynamic linking
to compare the two. The article I used was called “Malware Dynamic Analysis” by Veronica
Malware Dynamic Analysis involves the use of the PE file header and unlike the static link
which combines the .lib files to the PE file, the dynamic link loads the .dll files into the
process memory space, one at a time. The needed functions in the dlls could then just be
The main difference in dynamic malware analysis is that the code is actually run in a
monitored environment where the malware sample is checked to prevent it from spreading
PE RICH HEADER AND STATIC MALWARE LINKING 14
and infecting other systems. The malware can be checked for functionality which makes it
The main differences I could gather from analysing and comparing these two types of
The static analysis analyses the malware binary code by checking the signatures without
running the code whereas the dynamic malware analysis runs the malware in a monitored
2. Approach used
The static malware analysis uses signature-based approach in order to do analysis whereas
the dynamic malware analysis uses a behaviour-based approach to analyse and also detect
malware.
3. Processes
Static malware analysis involves virus scanning, packer detection, file fingerprinting, file
obfuscations and analysis of memory artefacts. Dynamic malware analysis on the other hand
uses API calls, memory writes, network and system calls, registry changes and instruction
traces.
Since the use of static malware analysis involves analysing the signature of the malware code
or program, it becomes hard to analyse malware codes that are sophisticated. The use of
dynamic malware analysis helps a lot with sophisticated malware since the analysis can be
With this, it is visible that dynamic malware analysis is superior in functionality especially
when dealing with sophisticated malware. However, one cannot take away the importance of
static malware analysis and how it identifies signatures for malware without necessarily
Conclusion
The paper covers the structure of the PE header and the Rich section of the PE header. In this
sections, we are able to identify the parts which are attacked by malware and helps malware
analysts to be able to detect the malware in the Rich header. The structure of the PE header
also helps to determine where the Rich header stops, starts and the location of other variables
When analysing the techniques used in identification of the RichPV hashes and Rich hashes,
one is able to identify what the techniques used in the evaluation of hashes such as the
impfuzzy, impash and ssdeep. The graphing to determine the best techniques to leverage the
PE Rich header is done in two ways: using the density of the techniques and using the CLC
metric. Both produce the same result in ranking the techniques as shown with the graphs
above.
A recommendation of this method of this analysis would be to switch from static to dynamic
feature extraction. The dynamic extraction uses API calls and a dynamic feature vector which
REFERENCES
Kovah, V. (2019). Malware Dynamic Analysis. Retrieved from Open Security Training:
http://opensecuritytraining.info/MalwareDynamicAnalysis.html
Ligh, M. H., Case, A., Levy, J., & Walters, A. (2016). The Art of Memory Forensics:
Detecting Malware and Threats in Windows, Linux, and Mac Memory. John Wiley &
Sons.
Malin, C. H., Casey, E., & Aquilina, J. M. (2014). Malware Forensics: Investigating and
Polychronakis, M., & Meier, M. (2017). Detection of Intrusions and Malware, and
Sikorski, M., & Honig, A. (2016). Practical Malware Analysis: The Hands-On Guide to
Sood, A., & Enbody, R. (2015). Targeted Cyber Attacks: Multi-staged Attacks Driven by
Wright, D. (2019). A Malware Analysis and Artifact Capture Tool. Dakota State University.
PE RICH HEADER AND STATIC MALWARE LINKING 17
Zamboni, D. (2018). Detection of Intrusions and Malware, and Vulnerability Assessment: 5th