Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

Running head: PE RICH HEADER AND STATIC MALWARE LINKING 1

Analysis of PE Rich Header and Static Malware Linking

Name

Institution
PE RICH HEADER AND STATIC MALWARE LINKING 2

Introduction

There have been increased cases of cyber-attacks, with the evolution of technology.

(Sood & Enbody, 2015)This brings about great damage to systems in general and makes a

disruption of the normal working of a company, organization or individual’s system. Cardon

Black defines that 86% of cyber-attacks were caused by the occurrence of by malware. This

shows how much malware is dangerous in the world right now and how effective it is in

cyber-attacks. These forms of malware are mostly related to one another and share a method

in which they attack these systems. As reported by AVTest, 37.9% of malware types, which

is the highest are comprised of the Windows portable executables. (Sikorski & Honig, 2016)

Adoption of techniques which use streamlined static malware analysis is done by

malware analysts in order to level the playing field. The work covered in this section involves

the use of the PE Rich Header which is checked for leveraging for static analysis.

Comparison of the samples with the PE-based malware is then done. The information

contained in the header gives a glimpse of the environment in which the executable was built.

The main techniques used in this scenario involve static cryptographic hashing which forms

benchmarks in which comparison can begin. Results are then checked and the data points are

analysed to find which can be used in leveraging in order to perform static detection and PE

samples linking. (Malin, Casey, & Aquilina, 2014)


PE RICH HEADER AND STATIC MALWARE LINKING 3

The Rich Header

According to Webster, the rich header comprises of an undocumented part of the P.E

header (Portable Execution) that results from the computing of the Microsoft-produced

executables building process. The rich header creation is mainly separated into two processes.

(Singh, 2019)

The first process involves the translation of the high-level code to machine-code

which is also referred to as low-level. High-level code involves use of code which is

understandable to humans and uses languages which can be easy to learn and adopt

functionalities, for example, Java. Low-level code or machine code is used by the computer

in identifying what the user has input through the application software.

According to Dean, the second process then involves the combination of the machine

code through a linking process which creates a single executable. This process results in the

creation of the PE Rich Header creation. The PE header is found in the disk and the memory

as shown below, sandwiched between the DOS and the data sections. (Song, 2012)
PE RICH HEADER AND STATIC MALWARE LINKING 4

Figure 1: PE Header Format


The Rich Header is obfuscated and encrypted and contains a decryption key which is

a 32-bit checksum. This key makes a way I which the contents of the section can be

decrypted and confirms the validity of the Rich header. (Marak, 2018)There is a single array

present in the Rich header which is used in storing arrays of the metadata on the steps in the

chain of building during the linking of objects process and is put into a single executable. The

contents of the 8-byte structure which is contained in an array is: the product version, the

product identification and a count which shows how many times each product was used.

There is no way to officially map the Microsoft products PE section but researchers

do a partial mapping which produces the required configurations. This information can be

displayed as:

pID: 134 pV: 18326 pC: 32

This gives all the details of the structure, that is, the product version, the product

identification and a count. The Rich header is concealed and is extracted programmatically
PE RICH HEADER AND STATIC MALWARE LINKING 5

with the body of the section. The decrypted rich header contains the DanS which is the start

of the Rich Header, the checksum and decryption key after the end of the rich header and 12

bytes of padding contained before the rich entry struct. (Wright, 2019) This applies to the

cmd.exe files. The use of Python in decoding and extracting the Rich header enables one to

view the contents stored in the Rich header which are input in a JSON file. A snippet of this

code can be shown as below:

{
"Product_Count": 66,

"Product_Version": 30729,

"Product_ID": 147

},
There are two hashing techniques which are used in the evaluation of the Rich header

utility. The techniques enable the linking of portable executables and static comparison which

looks at the hash values which are equivalent. (Ligh, Case, Levy, & Walters, 2016)The study

utilizes the use of MD5 algorithm but generally any cryptographic hashing function could be

used as long as the same methods are used.

The first technique involves the use of the Rich header’s decoded content as input. The

paper covers this computation as the rich hash. In summary this algorithm can be simplified

into the following processes:

1. Extraction of the Rich Header Section

This begins at the offset 0x80 where the rich header section is extracted

2. Identification of the decryption key

The decryption key is identified which is found at the end of the rich header
PE RICH HEADER AND STATIC MALWARE LINKING 6

3. XOR decryption

There is decryption of the rich header which begins at “Rich” and precedes with a byte

sequence until DanS.

4. Calculation of the MD5 checksum

The MD5 checksum is calculated for each section array that is cleartext. This MD5 checksum

is then deposited at the end of the rich header.

The second technique uses fingerprint creation for the content of the rich header

which makes a resistance which is greater for changes in the build environment of the

executable. The Rich hash comparison is also done. (Zamboni, 2018) The pC is excluded,

which is the most volatile section in the rich header body in order to accomplish it. The

source files referenced by the PE are measured by the pC, for each pV and pID pair. The

hashes of the Rich header are also computed using the Python code. This leads to the display

of the Rich PV hash and the Rich Hash as shown below:

"Rich Hash": "3d75441fa2dca655f337ee83519d34dc",

"Rich PV": "dc083eb68efdb8840ddfaee612a2755d"

The above code is an example of the cmd.exe rich header code.

The identification of the Rich header can be done using many varieties of mechanisms.

However, the use of Yara rules enables one to identify the Rich header content. This helps

greatly in identifying and exploiting the information which is stores in the rich header

section. The Yara documentation describes the properties of the rich header which are

specific. These are:


PE RICH HEADER AND STATIC MALWARE LINKING 7

a) Length

b) Raw_data

c) Offset

d) Version

e) Toolid

f) Clear_data

g) Key

The features described can all be leveraged and checked to ensure that they are leveraged

to create Yara rules which are used to identify the RichPV and Rich matches. Yara rules and

haslib modules can be combined to perform hash computations.

This computation involves the use of the clear_data rich signature and the hashlib

module. For example, Yara rule can identify the PEs which is equivalent to a certain Rich

hash. This can be illustrated below in the following snippet:

rule RichHash_3d75441fa2dca655f337ee83519d34dc

meta:

description ="Matches a Rich Hash of


3d75441fa2dca655f337ee83519d34dc"

condition:

hash.md5(pe.rich_signature.clear_data) ==
"3d75441fa2dca655f337ee83519d34dc"

This Yara rule shown above will determine the PE for the Rich hash of

3d75441fa2dca655f337ee83519d34dc.
PE RICH HEADER AND STATIC MALWARE LINKING 8

Evaluation with Malware Samples

During the experiment, there were 350 malware samples which were evaluated to

understand the utility of the Rich header which helps classify the malware. The classification

covers both targeted forms and opportunistic malware. The classification according to threat

actors included: APT1, ATP28, ATP29, the volatile cedar and the equation group. The

malware families on the other hand are comprised of Cobalt Strike Beacon, Stuxnet,

Carbanak, TurnedUp and Plugx, what is also referred to as Korplug. The samples of malware

were obtained from a combination of reports from advanced threat factors and malware

repositories which are public. (Polychronakis & Meier, 2017)

In order to provide benchmarks for the static comparison used in the evaluation of the

rich header, there are three techniques which are used. These techniques use comparison for

the proposed RichPV and Rich hashes. These techniques are:

1. Ssdeep

This technique is used in identification of similar files. It provides a fuzzy hashing

algorithm which is context triggered. The technique therefore is a cyber security standard in

the comparison of malware samples.

2. Import hash

This is the import address table fingerprint for the PE header. It works by identification of

the similar malware families. Since it contains a large import address table, it can be used to

identify and cluster unique group PE-based malware.

3. Import Fuzzy hash


PE RICH HEADER AND STATIC MALWARE LINKING 9

This technique uses the functionality of the ssdeep and applies the import address of a PE.

This makes it possible to identify the import tables which are similar even though they are not

fully equivalent.

Evaluation of the Rich and RichPV hashes

Once the malware samples are acquired, the static analysis products are created in the

next step. Calculation of each cryptographic hash is done by use of a custom Python script

which takes in takes in PE as input and produces a JSON dictionary which contains the MD5,

impfuzzy, imphash, Rich, ssdeep and RichPV hashes. The script uses the malware samples to

generate the desired hashes for the portable executables. (Tiziana Margaria, 2018) A Neo4j

graph is also constructed to view the relationships among the samples and evaluation of the

linking strength of the static techniques. The static and dynamic feature extraction can be

summarized as below:

Figure 2: PE Files Static and Dynamic Feature Extraction


PE RICH HEADER AND STATIC MALWARE LINKING 10

When the densities of the hashing techniques were graphed, one can view that the

calculations that resulted for the different hashing algorithms. The results of this in

descending order can be summarized as richPv, rich, impfuzzy , impash, ssdeep. The clones

centrality (CLC) is also calculated in order to provide a second metric of evaluation for the

techniques. This metric operates by providing a measure in which a static technique results in

using the shortest path to the nodes in the network. The CLC metric also gives quite the same

result after comparing the results. The descending order of the arrangement is the same.

These two metrics provide a method of demonstrating the RichPV and Rich techniques used

in classifying and linking samples.

When graphed using the density function, the graph obtained was:
PE RICH HEADER AND STATIC MALWARE LINKING 11

When the CLC metric is used:


PE RICH HEADER AND STATIC MALWARE LINKING 12

Strengths and Weaknesses

The research on static malware linking has various advantages on how to analyse malware.

These strengths include:

1. The Detail of Structure

In assessing the PE Rich header, there is a lot of detail in what the structure of the rich

header entails and where it is found. The details are provided with great accuracy, especially

the location of the start and end of the rich header and the MD5 checksum. This helps to

understand how the static linking of malware is done.

2. Use of Python Code in Determining the Rich PV hash

The use of Python in determining the contents of the Rich PV has is one of the strengths

of the article since there is no official way of determining the contents of the Rich PV. In

summary, the use of Python in decoding and extracting the Rich header enables one to view

the contents stored in the Rich header which are input in a JSON file.

3. Use of Different Hashing Algorithms

There are different hashing algorithms which are used in this article. These hashing

algorithms include: impfuzzy , impash and ssdeep. These algorithms to provide benchmarks

for the static comparison used in the evaluation of the rich header. Each type of hashing

algorithm is used for different purposes and functions.

4. The Graphing of the Density Function and CLC Metric

The graphs for both the density function and the CLC Metric provide a visual appeal in

which one can do static comparison of data very easily. Both metrics operate by providing a
PE RICH HEADER AND STATIC MALWARE LINKING 13

measure in which a static technique results in using the shortest path to the nodes in the

network.

Weaknesses

1. Complexity of Explanation

In reading the article, there were some new documentation facts which were very

complex to understand and I had to get some reference from external sources in order to

understand them. This includes the sections on the Yara Rules and Properties which took

some time to understand. Apart from that, the document was fairly simple with the right

explanations for the parts and enough visual aid, that is, pictures which helped in

understanding the article better.

Comparison with External Sources

The article covers the structure of the PE header and identifies the Rich section by use of

static linking. In an effort to determine whether analysis of the rich header was better done

using static linking or dynamic linking, I ended up doing some research on dynamic linking

to compare the two. The article I used was called “Malware Dynamic Analysis” by Veronica

Kovah. (Kovah, 2019)

Malware Dynamic Analysis involves the use of the PE file header and unlike the static link

which combines the .lib files to the PE file, the dynamic link loads the .dll files into the

process memory space, one at a time. The needed functions in the dlls could then just be

called by the main executable.

The main difference in dynamic malware analysis is that the code is actually run in a

monitored environment where the malware sample is checked to prevent it from spreading
PE RICH HEADER AND STATIC MALWARE LINKING 14

and infecting other systems. The malware can be checked for functionality which makes it

hard to miss important behaviours.

The main differences I could gather from analysing and comparing these two types of

malware analysis are:

1. Definition and Working

The static analysis analyses the malware binary code by checking the signatures without

running the code whereas the dynamic malware analysis runs the malware in a monitored

environment and checked.

2. Approach used

The static malware analysis uses signature-based approach in order to do analysis whereas

the dynamic malware analysis uses a behaviour-based approach to analyse and also detect

malware.

3. Processes

Static malware analysis involves virus scanning, packer detection, file fingerprinting, file

obfuscations and analysis of memory artefacts. Dynamic malware analysis on the other hand

uses API calls, memory writes, network and system calls, registry changes and instruction

traces.

4. Action against Sophisticated Malware Programs

Since the use of static malware analysis involves analysing the signature of the malware code

or program, it becomes hard to analyse malware codes that are sophisticated. The use of

dynamic malware analysis helps a lot with sophisticated malware since the analysis can be

done while executing it.


PE RICH HEADER AND STATIC MALWARE LINKING 15

With this, it is visible that dynamic malware analysis is superior in functionality especially

when dealing with sophisticated malware. However, one cannot take away the importance of

static malware analysis and how it identifies signatures for malware without necessarily

running the file.

Conclusion

The paper covers the structure of the PE header and the Rich section of the PE header. In this

sections, we are able to identify the parts which are attacked by malware and helps malware

analysts to be able to detect the malware in the Rich header. The structure of the PE header

also helps to determine where the Rich header stops, starts and the location of other variables

such as the checksum.

When analysing the techniques used in identification of the RichPV hashes and Rich hashes,

one is able to identify what the techniques used in the evaluation of hashes such as the

impfuzzy, impash and ssdeep. The graphing to determine the best techniques to leverage the

PE Rich header is done in two ways: using the density of the techniques and using the CLC

metric. Both produce the same result in ranking the techniques as shown with the graphs

above.

A recommendation of this method of this analysis would be to switch from static to dynamic

feature extraction. The dynamic extraction uses API calls and a dynamic feature vector which

I view would be faster than using the PE header and strings.


PE RICH HEADER AND STATIC MALWARE LINKING 16

REFERENCES

Kovah, V. (2019). Malware Dynamic Analysis. Retrieved from Open Security Training:

http://opensecuritytraining.info/MalwareDynamicAnalysis.html

Ligh, M. H., Case, A., Levy, J., & Walters, A. (2016). The Art of Memory Forensics:

Detecting Malware and Threats in Windows, Linux, and Mac Memory. John Wiley &

Sons.

Malin, C. H., Casey, E., & Aquilina, J. M. (2014). Malware Forensics: Investigating and

Analyzing Malicious Code. Syngress.

Marak, V. (2018). Windows Malware Analysis Essentials. Packt Publishing.

Polychronakis, M., & Meier, M. (2017). Detection of Intrusions and Malware, and

Vulnerability Assessment. Springer.

Sikorski, M., & Honig, A. (2016). Practical Malware Analysis: The Hands-On Guide to

Dissecting Malicious Software. No Starch Press.

Singh, A. (2019). Identifying Malicious Code Through Reverse Engineering. Springer

Science & Business Media.

Song, H. Y. (2012). Automatic Malware Analysis: An Emulator Based Approach. Springer

Science & Business Media.

Sood, A., & Enbody, R. (2015). Targeted Cyber Attacks: Multi-staged Attacks Driven by

Exploits and Malware. Elsevier Science.

Tiziana Margaria, B. S. (2018). Leveraging Applications of Formal Methods, Verification

and Validation Modeling. Springer.

Wright, D. (2019). A Malware Analysis and Artifact Capture Tool. Dakota State University.
PE RICH HEADER AND STATIC MALWARE LINKING 17

Zamboni, D. (2018). Detection of Intrusions and Malware, and Vulnerability Assessment: 5th

International Conference Proceedings. Springer Science & Business Media.

You might also like