Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Accepted Manuscript

A survey on forensic investigation of operating system logs

Hudan Studiawan, Ferdous Sohel, Christian Payne

PII: S1742-2876(18)30398-0
DOI: https://doi.org/10.1016/j.diin.2019.02.005
Reference: DIIN 835

To appear in: Digital Investigation

Received Date: 30 October 2018


Revised Date: 31 January 2019
Accepted Date: 26 February 2019

Please cite this article as: Studiawan H, Sohel F, Payne C, A survey on forensic investigation of
operating system logs, Digital Investigation (2019), doi: https://doi.org/10.1016/j.diin.2019.02.005.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT

A survey on forensic investigation of


operating system logs

Hudan Studiawan∗, Ferdous Sohel, Christian Payne


Discipline of Information Technology, Mathematics, and Statistics,

PT
Murdoch University, Australia

RI
Abstract

SC
Event logs are one of the most important sources of digital evidence for forensic investigation because they
record essential activities on the system. In this paper, we present a comprehensive literature survey of the
forensic analysis on operating system logs. We present a taxonomy of various techniques used in this area.
Additionally, we discuss the tools that support the examination of the event logs. This survey also gives a

U
review of the publicly available datasets that are used in operating system log forensics research. Finally,
we suggest potential future directions on the topic of operating system log forensics.
AN
Keywords: operating system logs, event logs, log forensics, log tamper detection, event correlation, event
reconstruction, event anomaly
M

1. Introduction 2011; Studiawan et al., 2017). Accordingly, meth-


ods to maintain the integrity of logs and to de-
There are various forms of digital evidence. tect any modification have previously been studied
For example, the browsing history, chat logs, au- 25 (Boeck et al., 2010; Sato and Yamauchi, 2012). In
D

thentication log files, and deleted files or im- general, log forensics have been investigated exten-
5 ages. Event logs are files that record the im- sively and a rich volume of literature is available on
TE

portant activities performed by the user, appli- this topic. In this paper, we provide a comprehen-
cation software, or operating systems. There- sive survey of this existing literature.
fore, these files are considered one of the main 30 This paper reviews various aspects of the forensic
pieces of evidence for digital forensic analy- analysis of event logs focusing on operating system
EP

10 sis. For example in Windows operating system, (OS) logs. We construct a taxonomy on the ba-
the log files are usually found in the directory sis of a generic investigation pipeline such as event
C:\Windows\System32\Winevt\Logs\, while in the logs recovery, event correlation, event reconstruc-
Linux environment they are located in /var/log/. 35 tion, and visualization. Based on a generic forensic
C

Log files serve various purposes. An event log framework (Yusoff et al., 2011), we categorize ex-
15 can be used as evidence in court (Ibrahim et al., isting publications within this taxonomy. We then
AC

2012). These files may assist in the reconstruction thoroughly discuss the advantages and disadvan-
of an attack (Liao and Langweg, 2014). They can tages of the techniques in each category.
also support identification of relationships between 40 The paper is structured as follows. Section 2 pro-
separate events (Herrerı́as and Gómez, 2010; Amato vides a brief description of other relevant surveys
20 et al., 2017). Log files can be used to detect anoma- and then outlines the contributions made by this
lous user behavior or system activity (Corney et al., work. Section 3 defines the terminology used in
this paper. We present the survey methodology for
∗ Corresponding
45 event log forensics in Section 4. A mind map of the
author
Email addresses: hudan.studiawan@murdoch.edu.au
event log forensics taxonomy is also depicted in this
(Hudan Studiawan), f.sohel@murdoch.edu.au (Ferdous section to describe the high level structure of the as-
Sohel), c.payne@murdoch.edu.au (Christian Payne) sorted methods used in this area. The subsequent
Preprint submitted to Digital Investigation February 27, 2019
ACCEPTED MANUSCRIPT

sections (Section 5 to Section 9) will describe a re- (b) We present a list of existing methods for each
50 view for each step in the generic framework. We topic, provide a critical summary, and analyze
also describe the forensic tools in Section 10 and 100 the respective advantages and disadvantages.
the publicly available datasets to support OS log
forensics experiments in Section 11. The discussion (c) This paper reviews methods for the forensic
about datasets will assist future researchers to se- analysis of OS logs as this evidence is com-
55 lect the most appropriate data and case study. We monly found when extracted from a forensic

PT
outline the current challenges and potential future disk image.
directions of research on this topic in Section 12. 105 (d) We present a mind map taxonomy to assist
Finally, Section 13 concludes this study. with the classification of research across a
range of areas related to forensic investigation

RI
2. Relevant surveys and our contributions of OS logs.

60 There are several other surveys connected with (e) We discuss forensic tools as well as public

SC
event log forensics. A recent survey by Khan et al. 110 datasets focusing on OS logs.
(2016) focuses exclusively on event log forensics in
the cloud environment. Khan et al. (2016) analyze 3. Terminology
the accessibility of cloud logs, logging as a service,

U
65 security requirements, and possible security chal- For clarity, this section outlines the terminology
lenges faced in the cloud. There are several other used in this survey. An event is an identifiable ac-
tion that happens on a device and is recorded in a
AN
surveys on the cloud log forensics (Mishra et al.,
2012; Almulla et al., 2014; Farina et al., 2015). 115 log entry (European Commission, 2010). An event
However, they are not as comprehensive as Khan log is a record of events, usually implemented in a
70 et al. (2016). log file or a table in a database. A log file is a file
Mishra et al. (2012) focus on reviewing existing that records activities from applications or operat-
M

cloud forensics frameworks. Almulla et al. (2014) ing system. This artifact is the primary focus in
give other insights by classifying the cloud log foren- 120 event log forensics and contains log entries. A log
sics not only by the method, but also by trending entry is defined as a single record in a log file. An
D

75 technology and the forensic framework. A different event message refers to the main message in a log
view of cloud log forensics is presented by Farina entry excluding timestamp and any other fields such
as hostname and application process name. Oper-
TE

et al. (2015) by providing a review of remote foren-


sics techniques, live forensics, and cloud-facilitated 125 ating system logs are log files in a particular OS
forensics analysis. such as Windows and Linux.
80 Other related survey paper (Chabot et al., 2015) Additionally, digital evidence is defined as any
discusses event reconstruction based on log files for data stored or transmitted using a computer that
EP

forensic purposes. However, the study by Chabot supports a theory of how an offense occurred or how
et al. (2015) does not provide any clear classifica- 130 the data addresses critical elements of the offense
tion of existing techniques and describes only a few (Casey, 2011). Although the digital data is not di-
studies on event reconstruction. A review of web rectly related to an attack, it will be considered as
C

85

log forensics was presented in Lazzez and Slimani a digital evidence as long as it is discovered in a
(2015). In addition to discussing the methodology, crime scene. The forensic investigator is a person
AC

Lazzez and Slimani (2015) provide a comparison of 135 who analyzes the digital evidence, specifically event
several investigation tools for conducting the web logs in this case. An artifact is a digital object that
90 application forensics. will be investigated such as log file, disk, memory,
Compared with existing related survey articles, or an image file. However in this survey, an artifact
the contributions of this study are listed below. will always refer to an event log file from an OS
140 unless otherwise indicated.
(a) This study provides a broad range of topics
across a large number of papers, including 4. Survey methodology
95 event log security and recovery, event recon-
struction and correlation, event anomalies, and In this section, we describe the methodology for
visualization. constructing the taxonomy of event log forensics.
2
ACCEPTED MANUSCRIPT

The detailed aspects of the methodology are ex- evaluate which model is more appropriate for
145 plained in each subsection below. a particular case.

4.1. Forensic framework for classifying studies The description of these aspects in the event log
We refer to Generic Computer Forensic Investiga- 195 forensics framework is summarized in the Fig. 1.
tion Model (GCFIM) (Yusoff et al., 2011) and map Sections 5 - 9 give a detailed description of each
existing publications into this framework. We chose topic. A mind map of the taxonomy of OS log foren-

PT
150 this framework because it provides completeness sics is provided in Fig. 2. Besides the framework,
and can accommodate various methods for event the mind map includes the tools and open datasets
log forensics. Another general advantage of GC- 200 that are publicly available to promote reproducible
FIM is that it is created based on the identification research.

RI
of the common processes in forensic investigation.
155 GCFIM is built based on a detailed review of 15 4.2. Inclusion and exclusion criteria for literature
digital forensic investigation models.

SC
The following are the typical steps in event log We define the inclusion and exclusion criteria for
forensics based on GCFIM framework. papers discussed in this survey as there are many
205 papers that relevant to OS log forensics topics. The
(a) Pre-processing step as forensic readiness of OS inclusion criteria are explained as follows.
160 logs

U
This step concerns secure handling of logs and (a) The papers consider an issue in one of steps in
provides an explanation of event logs as a dig- forensic investigation. The steps include pre-
AN
ital evidence. If the event logs are secured by processing for forensic readiness, acquisition of
design, then they will be forensically ready to 210 OS logs, and main forensic examination such
165 be examined when a cyber incident has oc- as tamper detection and event reconstruction.
curred. The content of the paper should discuss one of
M

these forensic aspects explicitly.


(b) Acquisition of OS logs
Acquisition refers to the recovery of event logs.
(b) The papers discuss forensic tools such as tools
This artifact may be already available or the
for extracting Windows event logs.
D

215
170 investigator may need to recover event logs
from a device because they have been deleted. (c) The papers discuss case study and forensic
TE

(c) Main analysis of OS log investigation dataset which is considering OS logs.


This is the main phase of the framework. The
investigator will check if there are any modifi- (d) The publication is written in English and stan-
175 cations in the logs. They can be accessed using dardized paper style.
EP

a special query or retrieval technique. In this


220 Moreover, we specify the exclusion criteria for
phase, the most common analyses conducted
some papers that are not included in this survey.
are event correlation, event reconstruction, or
The criteria are described below.
anomaly detection.
C

180 (d) Visualization of OS logs from investigation re- (a) Papers discussed forensic analysis of network
sults logs or network traffic saved in a packet capture
AC

After the analysis is complete, the result needs 225 (pcap) file. For further information about this
to be visualized in order to provide the forensic subject, the readers are referred to a survey of
investigator with relevant insights. Visualiza- network forensics (Pilli et al., 2010).
185 tion can also produce both general and detailed
events for further investigation. (b) Event logs in forensic topics are not to be con-
fused with logs from business process as this
(e) Post-process of OS log investigation 230 topic are intensively discussed in the process
The last phase of the framework is a review mining research area. For more detail about
of the overall investigation process. This step business event logs, the readers can check a
190 includes evaluation of digital forensic investi- survey paper on process mining (Van der Aalst,
gation frameworks. The investigator needs to 2013).
3
ACCEPTED MANUSCRIPT

Pre­processing step as ­ Guarantee the security of OS logs
forensic readiness of OS logs ­ OS logs as a digital evidence

Acquisition of OS logs ­ Recovery of the OS logs from the device

PT
­ Retrieval of OS logs

RI
Main analysis of  ­ Detection of tamper operation
OS log investigation ­ Correlation and reconstruction of events
­ Detection of anomaly 

SC
Visualization of OS logs ­ Visualization to present the analysis results

U
AN
Post­process of  ­ Review the investigation process
OS log investigation
M

Figure 1: Proposed framework of OS log forensics based on Generic Computer Forensic Investigation Model (Yusoff et al.,
2011)
D

235 (c) The method and tools which do not explicitly application actions. Therefore, provenance is
write about forensics. For example, there are useful for event log forensics to provide digi-
papers discussing anomaly detection in event tal evidence for post-investigation. For a com-
TE

log data, but they do not explicitly discuss the 260 plete review of secure data provenance for var-
forensic investigation. Therefore, we do not ious purposes and applications, we recommend
240 include these kinds of papers in this survey. the reader to refer to a survey by Zafar et al.
(2017).
EP

(d) Cloud log forensics


Cloud computing has been an emerging area 4.3. Paper collection
in information technology and therefore there 265 There are three phases for paper collection in a
are many studies that examine event logs in survey paper. The first phase is to search the liter-
C

245 a cloud environment. As discussed previously, ature. We explore the papers on event log forensics
we have excluded cloud log forensics from the in the following online libraries from several lead-
AC

scope of this paper. Interested readers are re- ing publishers: ACM Digital Library, ScienceDi-
ferred to Khan et al.’s survey of cloud-based 270 rect, IEEE Xplore, and Springer Link. We also
log forensics (Khan et al., 2016). search through Google Scholar as this search engine
is specially designed for searching scientific litera-
250 (e) Secure data provenance ture. The term used for searching through these
Provenance is information containing where digital libraries is “event log forensics”. We con-
and how a data, such as log files, was written, 275 sider all the papers related to the area from 1997-
who was created the data, and modifications 2018.
involved (Zafar et al., 2017). Provenance can The second phase is filtering the search results.
255 determine suspicious activities in a system be- We filter the results whether or not relevant to OS
cause of its tracking ability to both kernel and log forensics. We first read title, abstract, and then
4
ACCEPTED MANUSCRIPT

Cryptographic approaches
Log centralization

OS log security Cryptographic log centralization


Pre-processing
Virtual machines
Secure data structure

OS logs as digital evidence

PT
Acquisition of OS logs

RI
XML-based

OS log retrieval Database


Live capture

SC
Rule-based

Tamper detection Cryptographic hashes


Hardware-based
Rule-based

U
Database
Semantic model
AN
Main analysis
Tree or graph-based
Event correlation and reconstruction
Timestamp-based
Finite state machines
M

OS log forensics Virtual machines


Live event reconstruction
User profiling and machine learning
D

Anomaly detection Timestamp-based


Log clustering
TE

Event log abstraction

Forensic timelines
EP

Visualization of OS logs Tree-based


Graph-based

Post-process of OS log investigation


C

General tools
Tools for OS log forensics
AC

Libraries
Digital Corpora
DFRWS Challenge
Public datasets for OS log forensics CFReDS Project
e Honeynet Project
SecRepo

Figure 2: A taxonomy of OS log forensics literature

5
ACCEPTED MANUSCRIPT

280 the contents of a paper. The filtering is based on approaches; 2) log centralization; 3) cryptographic
inclusion and exclusion criteria described in Section log centralization; 4) virtual machines; and 5) se-
4.2. cure data structures.
The third phase is to conduct a recursive search
for references. In each paper found from the second 5.1.1. Cryptographic approaches
285 phase, we check the references and open the po-
330 The standard technique to identify whether or
tential article. We apply the second phase to each
not data is modified is using a hash function. It

PT
paper. We want to trace the reference to its first
produces a fix-length string from arbitrary data in-
publication discussed about a certain topic. This
put called hash value. A cryptographic hash pro-
process is assisted by Google Scholar feature “cited
vides a benefit that the hash value is infeasible
290 by” represented by the quotation mark button from

RI
335 to be converted back to the original input data.
a particular paper title.
Schneier and Kelsey (1999) propose a basic method
Furthermore, we identify some information about
for securing event logs for forensic purposes based
the papers when collecting specifically:
on cryptographic hashes. It provides security and

SC
(a) full citation reference, such as author names, integrity for event logs even on an untrusted de-
295 title, journal or conference name, and year, 340 vice. This work became the foundation for sub-
sequent research on log security. In general, the
(b) classification of the paper to GCFIM as the method creates an authentication key which is as-

U
formal forensic framework used in this survey, sumed securely generated and saved on the ma-
chine. This key is used to cryptographically hash
(c) sub-classification of the paper inside the main
AN
345 each log entry. Every single log entry becomes part
framework of GCFIM,
of a hash chain to authenticate all previous log en-
300 (d) summary of the proposed method, its advan- tries (Schneier and Kelsey, 1999). However, the
tages, as well as its disadvantages. Schneier and Kelsey’s method has a limitation. If
M

an attacker is able to gain access to an insecure


350 machine, where the event logs are saved, then the
5. Pre-processing step as forensic readiness attacker can continue to write log entries as if it is
of OS logs authorized (Schneier and Kelsey, 1999).
D

This section discusses the pre-processing step in To make the OS logs more secure, there is a
305 OS log forensics. According Tan (2001), digital method to prevent any edit or delete operation in
TE

forensic readiness (DFR) has two aims. First, DFR 355 the log files (Etoh et al., 2010). They apply a de-
is designed to maximize an organization’s ability centralized management of log files in several log
to acquire credible digital evidence. Second, DFR servers based on a cryptographic hash function. It
intends to minimize the cost of investigation when means the log files are split into chunks and each
EP

310 a security incident has happened. In the context chunk possibly located in different servers. Each
of OS log forensics, DFR is related to build secure 360 chunk is copied not only to one server to provide
log infrastructure, so it will be forensically ready to redundancy for backup purposes. However, if re-
be analyzed. In addition, Elyas et al. (2015) sug- dundancy in these dispersion files is removed by
C

gested that DFR not only related to infrastructure the attacker, the detection of an attack is not al-
315 but also be legal-ready. Therefore, we also discuss ways possible (Etoh et al., 2010).
AC

OS logs as digital evidence that can be presented


in courts. 365 5.1.2. Log centralization
Traditional logging systems such as syslogd
5.1. OS log security (Wettstein et al., 2018), and its extension
This phase deals with securing OS logs from mod- syslog-ng (One Identity, 2018), can provide
320 ification so that they can be forensically examined. backup and security to event logs on a Unix-based
This process is also referred as ante-mortem foren- 370 OS. These systems can be configured on either lo-
sics where the system is ready to be investigated in cal or remote machine. Research has found that
future when an attack or a security incident hap- the centralization of event logs can increase secu-
pens. We classify security of OS logs into five cate- rity and also standardizes the logging mechanism
325 gories based on the methods used: 1) cryptographic for forensic purposes (Sahoo et al., 2012). Sahoo
6
ACCEPTED MANUSCRIPT

375 et al. (2012) present a system for securing Win- and authenticate syslog (Monteiro and Erbacher,
dows event logs against software or hardware fail- 2008). This study adds various fields such as user-
ures. The Windows event logs are converted to name, application, and system to each syslog entry.
syslog format because syslog is the common log- It also adds an authentication mechanism before
ging standard for various devices. The event logs 430 sending the logs to the server. However, a proof of
380 are collected proactively at local machine and these concept with experiments is not available in that
logs are then sent to remote logging server. work (Monteiro and Erbacher, 2008).

PT
The advantage of the centralization method is To deal with event logs from multiple sources in-
that it provides an automatic approach to record cluding OS logs, Lin et al. (2009) provide an auto-
Windows event logs on a dedicated logging server 435 matic forensic analysis by implementing a collection
385 and also gives a monitoring interface. The proposed agent. This procedure first aggregates the event

RI
method employs a service attached to a native Win- logs, then normalizes and analyzes them in a cen-
dows process. This approach leaves the original log tral location. In addition, a combination of hash
files on the local machine and sends the duplicates function, digital signature, and timestamp are used

SC
to the server. Therefore, it is less vulnerable when 440 to preserve the integrity and authenticity of event
390 the machine is compromised as there are copies of logs (Lin et al., 2009). However, there is a prob-
event logs. The centralization method is more fo- lem with duplication of logs from the proposed log
cused on maintaining the confidentiality of event collection agent and some attacks are unable to be

U
logs. Although this architecture can be extended reconstructed (Lin et al., 2009).
to other platforms, the method currently only sup- 445 In addition, one can improve the authenticity
ports Windows event logs. In addition, this ap- of event logs using a distributed log architecture,
AN
395

proach does not encrypt the log entries when they called BBox, along with event log encryption and
are sending to server. The log duplicates in the tamper detection (Accorsi, 2011). One drawback
server are also not encrypted. Therefore, we need of the BBox architecture is adding entries to the
cryptographic log centralization to provide more se- log file is rather expensive and becomes a bottle-
M

450

400 cure event logs as discussed in the next section. neck. Therefore, the method needs more efficient
data structures (Accorsi, 2011).
5.1.3. Cryptographic log centralization As centralization and cryptography provides
D

While centralization techniques and crypto- many advantages, we outline a typical model for
graphic log entries are separate methods, they are 455 this approach in Fig. 3. There are two main parts
provide greater security if combined into a hybrid of the architecture, namely monitored client and
TE

405 architecture. A system can defend event logs from forensic server. The first part contains a log col-
malicious attacks by building a network processor lection agent, which processes OS logs and creates
and secure operating system as proposed by Ra a cryptographic hash. The log entries are sent to
and Park (2009). A network processor is a hard- the forensic server with encryption enabled. On
EP

460

ware similar to central processing unit (CPU) and is the server side, the log entries are decrypted and
410 programmable for networking-related applications. the hash is verified. The server also normalizes OS
For the secure OS, the authors use a hardened ver- logs for further analysis such as insertion into se-
sion of Linux utilizing a security module based on cure database or tamper detection operation. The
C

US Department of Defense requirements. This se- 465 investigator can view and monitor all log entries in
curity module is well-known as Security-Enhanced this server side.
AC

415 Linux (SELinux). The evaluation of this method


shows it can successfully detect unauthorized ac- 5.1.4. OS log security using virtual machines
cess attempts on a machine. However, there is a The use of virtual machines can preserve the in-
disadvantage in Ra and Park’s approach. The pro- tegrity of OS logs as demonstrated by Chou et al.
posed method imposes a time overhead for crypto- 470 (2008). The authors also propose using a kernel
420 graphic process which needs to be reduced to make module to write OS logs so it can be securely loaded
the performance acceptably faster (Ra and Park, together with other kernel modules. They test two
2009). In addition, this architecture only supports approaches and chose the virtualization technique
Linux-based environments in the given implemen- because it offers a more secure environment as it is
tation. 475 isolated from the host operating system. However,
425 Another work proposes a mechanism to validate there is no explanation about how can the host and
7
ACCEPTED MANUSCRIPT

Monitored client Forensic server

Decryption and 
OS logs
hash verification

Secure 
Cryptographic hash Log normalizer

PT
database
Encryption

Log collection agent Log viewer Tamper detection

RI
Figure 3: A typical model for OS log security using centralization and cryptographic approach

SC
guest operating systems can communicate securely Besides the research in event log security to sup-
(Chou et al., 2008). port the forensic investigation, there are articles
In 2012, a study addressed secure methods for that provide a framework or a review in this area.
Ahmad and Ruighaver (2003) define the require-

U
480 logging by employing a separate virtual machine to 515

store the OS logs (Sato and Yamauchi, 2012). The ments of an audit management infrastructure to
proposed mechanism compares the logs in a mon- improve log security. The framework is based on a
AN
itored host operating system and the virtual ma- top-down approach to fill the gap between organi-
chine. Although this method is able to detect log zational security policy and event log configuration.
485 loss and data tampering, it imposes a large over- 520 Ayrapetov et al. (2002) present an improvement of
head in sending the logs to the virtual machine. the secure OS logging mechanism. It presents a
M

taxonomy of existing secure logging systems, ana-


lyzes the unaddressed issues, and provides possible
5.1.5. Secure data structures solutions for future research.
D

In order to identify the modifications of an event 525 Two similar papers by Accorsi (Accorsi, 2009a,b)
log, a secure data structure can be constructed as review existing secure OS log protocols. This work
an indexing engine. The tree-based data structure presents the comparison of the security require-
TE

490

can provide incremental and membership proof of ments fulfilled by the existing secure protocols for
each log entry (Crosby and Wallach, 2009). This OS logs as digital evidence. Accorsi emphasizes
mechanism will provide evidence that the OS logs 530 that the event log security mechanism needs formal
are authentic and detect any tampering attempt. verification to guarantee the correctness of the pro-
EP

495 The merit of this approach is it produces only a posed algorithm. Furthermore, there is a need for a
very small sized hash chain for very large log en- standard format for event logs as a digital evidence
tries. Although this method was applied to syslog, that will be presented in court (Accorsi, 2009a,b).
it can be extended to the various types of event 535 Finally, we summarize the various methods in event
C

logs thanks to the generic nature of the tree data log security in Table 1.
500 structure.
AC

Another approach starts with building a model 5.2. OS logs as digital evidence
for the secure data structure based on combina-
torial group testing (Goodrich et al., 2005). The Event logs have been evaluated for their qual-
data structures included are arrays, linked lists, ity as digital evidence in courts, with one of the
505 binary search trees, skip lists, and hash tables. 540 earliest works that provides a framework of tests
However, this secure data structure does not to evaluate digital evidence being found in Som-
consider existing log entries that get rewritten mer (1997). A log file can be used as an evidence
as usually the system will overwrite the old log when it has passed several tests such as acquisi-
entries. This generic protocol can be applied to tion process test, chain of custody test, and quality
510 secure OS logs. 545 of forensic presentation test. Acquisition process
test has two main aspects specifically accurate and
8
ACCEPTED MANUSCRIPT

Table 1: A summary of key publications in OS log security


Method Publication Advantage Disadvantage
Cryptographic Schneier and Kelsey Able to run in unsecure machine Invalid log entries are still processed
approaches (1999)
Etoh et al. (2010) Disperse the log files Attack detection depends on
redundancy
Log centralization Sahoo et al. (2012) Simplify logging mechanism Only support Windows
Cryptographic log Ra and Park (2009) Double security mechanism Produce cryptographic time

PT
centralization overhead
Lin et al. (2009) Deal with multi source event logs A few reduplicate logs
Accorsi (2011) Complete package of log security A little bottleneck when adding log
entry
Virtual machines Chou et al. (2008) Demonstrate that virtualization No explanation how host and guest

RI
is better than kernel module OS communicate securely
Sato and Yamauchi (2012) Prevent both event log loss and Produce a big overhead when
tampering sending logs
Secure data structures Goodrich et al. (2005) Provide security in regular data Do not consider data that changes

SC
structure over time
Crosby and Wallach Produce a small hash chain for This method can be extended to
(2009) large logs various type of logs

U
complete. First, accurate means free from any con- 580 ciency of Windows event logs as a digital evidence.
tamination when an acquisition is performed. The Assessment of event logs based on admissibility and
AN
procedure must be conducted by certified forensic weight regarding legal evidence is presented. The
550 investigator. Second, the acquired evidence can tell general standard for admissibility of evidence is to
a complete chronology of particular set of event se- prove that the evidence is relevant, authentic, and
quences. In a chain of custody, the log evidence has 585 reliable. It is also required that the evidence satis-
been acquired on a device, viewed, and investigated. fies the legal rules and does not contain hearsay ma-
M

This evidence is possibly copied several times by terial. In assessing the weight of evidence, a number
555 the police or forensic experts. In this case, the log of features are put into consideration based on re-
evidence must be securely preserved and tamper- quirements from Sommer (1997). The features are
D

proof. To test the quality of forensic presentation, 590 authenticity, accuracy, completeness, clear chain of
the evidence in electronic form needs to be printed custody, and transparency of forensic procedure.
and presented in a court. The authorities should However, this method was only tested on Windows
TE

560 offer two presentation forms, specifically raw and 2003 whereas the paper was written in 2012.
detailed version and an edited one. The later ver-
sion provides narrative to explain what was done
and why to be clearly understood by court. 6. Acquisition of OS logs
EP

Another research article assesses the evidential 595 In the acquisition phase, event logs are acquired
565 weight of OS logs (Ahmad and Ruighaver, 2004). from a device, specifically a hard disk. The problem
This work defines three specific criteria namely ac- arises when the log files are removed, so we need to
C

curacy, completeness, and utility to evaluate event recover them. This section discusses acquisition OS
logs as digital evidence. Accuracy and completeness logs. In some cases, OS logs are deleted by the at-
AC

are based on criteria from Sommer (1997). More- 600 tacker to wipe the digital evidence. Additionally,
570 over, Ahmad and Ruighaver (2004) add utility as the deletion of a log file is a typical malware behav-
one additional criteria. The utility is a desirable ior in Windows environment (Bayer et al., 2009).
quality of event logs that makes the evidence more The forensic investigator needs to recover this ev-
exposes some factors, specifically: 1) a proof of the idence to ascertain further information about the
correct working activities on the investigated sys- 605 incident. Therefore, some studies focus on the re-
575 tems; 2) an identification of the host system and covery of deleted log files before conducting main
incidents in detail; and 3) an identification of any forensic analysis.
information contained in event logs.
File recovery can be performed without any file
For Windows as the most common operating sys- system metadata available. Exploiting this prop-
tem, Ibrahim et al. (2012) have analyzed the suffi- 610 erty, Richard III and Roussev (2005) created the
9
ACCEPTED MANUSCRIPT

Scalpel tool. This is a file carving tool that recov- detect a possible modification to OS logs, identify
ers any files by analyzing the header and footer of and reconstruct the cause of incident by event cor-
a chunked segment on the disk, which can then be relation and reconstruction, and anomaly detection
reconstructed to get the whole file. Subsequently, in log files. In addition, this section also addresses
615 Craiger (2005) proposed a technique to recover var- 665 event log retrieval and event log abstraction.
ious digital evidence from Linux environment in-
cluding event logs. This technique uses standard
7.1. OS log retrieval

PT
Linux command such as dd to recover a file based on
a particular signature. The benefit of Scalpel tool is In the case of digital forensics, retrieval deals with
620 that it can recover the deleted file quickly and sup- how the investigator can save, search, or perform a
ports all major OS such as Windows, Linux, and query on the event logs. This process can be clas-

RI
Mac OS X. Although the Craiger’s technique can 670 sified based on the type of storage used: 1) XML-
use Linux standard commands and tools, it does not based; 2) database; and 3) live capture.
scale for large quantities of event log data (Craiger,

SC
625 2005).
In the Windows environment, Murphey (2007a) 7.1.1. XML-based log retrieval
proposes a technique for automatic recovery and Alink et al. (2006) present a mechanism called
repair of Windows NT5 (XP and 2003) event logs. XIRAF to manage and query digital evidence in-
cluding event logs from Windows. XIRAF extracts

U
This method employs the Scalpel tool for recovery. 675

630 It then repairs the file by scanning for the trailer the digital evidences from a forensic image and
signature and validates the result using LogParser converts them into XML (eXtensible Markup Lan-
AN
tool (Microsoft, 2005). However, this approach has guage) format. The investigator then performs in-
not been extended to Windows NT6 (Vista) and dexing and querying based on the XML database.
newer versions (Murphey, 2007a). 680 Another study parses Windows Vista event logs to
Furthermore, Schuster (2007) investigates Win- facilitate more effective analysis (Huang and Wu,
M

635

dows Vista event logs and recovers them by parsing 2009). The authors organize binary XML Windows
the unique magic strings and block layout of a log Vista event logs and convert these to a readable
file. Every type of log file has these magic strings XML-based structure. Another approach, namely
D

in the header that differ from one to another, as- 685 XLIVE, builds an XML-based structure to save
640 sisting the recovery process. In addition, this work various event logs including OS logs and classify
offers a detailed description of Windows Vista logs. them (Lee et al., 2010). This method then provides
TE

However, not all elements in the XML file can be de- a framework for automatic investigation. XLIVE
fined due to the unavailability of official documen- also supports live capture of digital evidence as dis-
tation from Microsoft (Schuster, 2007). Another 690 cussed in the next subsection.
technique exists to retrieve the fragmented event Although XIRAF (Alink et al., 2006) offers a
EP

645

log files automatically (Lou et al., 2009). It is able quite complete query platform, it does not provide
to look for fragmented log files without metadata support for binary file indexing such as BLOB type
by using a signature and the entropy difference be- to be correlated with other log files. Another XML-
tween adjacent disk clusters. Similar to Murphey based method (Huang and Wu, 2009) still needs fur-
C

695

650 (2007a), its main disadvantage is that it only cov- ther research for the unidentified data type in Win-
ers Windows NT5 event logs. Another improve- dows Vista event logs. In addition, XLIVE (Lee
AC

ment can be made by defining a better fragment et al., 2010) needs to be extended to non-Windows
boundary of a log file that saved in the disk. The environments.
more accurate fragment detection will improve the 700 A typical model for event log retrieval based on
655 recovery results. an XML approach is given in Fig. 4. First, the OS
logs are preprocessed and the fields such as times-
7. Main analysis of OS log investigation tamp, process identifier, and the main message are
extracted. The log entries are then written into
This phase is the primary process of the foren- 705 an XML-based repository or database. The foren-
sic investigation. Various types of examination are sic investigator can access the event logs via query
conducted on the acquired evidence as discussed in interface to examine a particular event or analyze
660 Section 6. The main objectives of this phase are to suspicious events via a log viewer.
10
ACCEPTED MANUSCRIPT

Pre­process and  XML writer
field extraction
OS logs

Query interface XML­based repository 
or database
Forensic 
investigator

PT
Log viewer XML parser

Figure 4: A typical model for OS log retrieval using XML-based approach

RI
7.1.2. Log retrieval using database type of digital evidence, 2) data collection module
The performance of searching for a particular to organize the evidence found, 3) data parsing

SC
710

log message is important as a part of the anal- and writing to manipulate the digital evidence into
ysis. Takahashi and Xiao (2008a,b) analyze the 750 XML structure, and 4) database and report mod-
complexity of event log retrieval in the searching ule. An agent-based approach (Awawdeh et al.,
process. They compare and calculate the complex- 2013) is another alternative for real-time evidence

U
715 ity of a number of searching algorithms to retrieve collection. Its main module to get event logs is
event logs in terms of big O notation. Awawdeh Windows Event Watcher which gathers all event
AN
et al. (2013) suggest implementing recording mod- 755 logs generated by Windows such as application
ules for event logs and saving the processed logs logs, hardware event logs, and security logs.
into a database. The investigator can then perform
720 a query to the database for further analysis. Besides the aforementioned categories, there are
M

Takahashi and Xiao (2008a) compare some search specific studies on the Windows operating system
algorithms for event logs. They report that the 760 for event log retrieval. In Talebi et al. (2015), a
work can still be improved by evaluating other tech- deep analysis of Windows 8 event logs is presented
niques as there are many such algorithms. The later for the first time. This work explains a detailed
D

725 method (Awawdeh et al., 2013) only uses a small anatomy of Windows 8 event logs such as event
portion of memory to process large logs. In spite types, log format, and log structure. It also pro-
TE

of this advantage, the method can be extended to 765 vides forensic analysis of an unauthorized access
other environments such as Windows 8, Linux, and attempt. Furthermore, the investigator can per-
Mac OS X. form analyses in the Windows environments using
PowerShell. Barakat and Hadi (2016) demonstrate
EP

PowerShell’s ability for evidence collection, extrac-


730 7.1.3. Live capture of OS logs
770 tion, and identifying various forensic artifacts in-
Live capture means the analysis is conducted in
cluding event logs. The use of PowerShell brings
real time and there is no need to wait until an inci-
many advantages as it is a native tool supported
dent occurs. Choi et al. (2008) introduce live analy-
C

in Windows. The features from Barakat and Hadi


sis of digital evidence in Linux environments. They
(2016) can be extended for newer Windows ver-
735 provide a framework for real-time analysis, specifi-
sions. A summary of event log retrieval publica-
AC

775
cally for evidence collection, forensic analysis such
tions is shown in Table 2.
as investigating running processes by a particular
user, and generating reports. This framework can
be improved by generalizing the capability to other 7.2. Tamper detection of OS logs
740 Linux distributions. The detection of modification is needed to vali-
On the other hand, one can use XML-based date the integrity of event logs. It often becomes
structures to perform live capture called XLIVE 780 a part of the same event log security process de-
(Lee et al., 2010) as discussed previous subsection. scribed in Section 5.1. The difference with the pre-
XLIVE can both capture volatile and non-volatile vious phase is that the detection is performed after
745 data including event logs. There are four main an incident occurred. We divide the approach of
modules in XLIVE: 1) type analyzer to detect the tamper detection into four main categories: 1) rule-
11
ACCEPTED MANUSCRIPT

Table 2: A summary of key publications in OS log retrieval for forensic purposes


Method Publication Advantage Disadvantage
XML-based Alink et al. (2006) Provide a complete query Do not provide support for binary file
platform indexing
Huang and Wu (2009) Parse raw file from There are some unidentified data types
unallocated space
Lee et al. (2010) XML-based framework for Only support Windows environment
event log capture

PT
Database Takahashi and Xiao (2008a) Review search algorithm for Needs to check for other search algorithms
event logs
Awawdeh et al. (2013) Small memory allocation to Only support Windows XP and Windows
process large logs 7
Live capture Choi et al. (2008) Framework for live capture of Only support Linux Fedora

RI
evidence
Lee et al. (2010) XML-based framework for Only support Windows environment
event log capture

SC
785 based; 2) cryptographic hashes; and 3) hardware- 820 this hash-based approach is that it is very fast be-
based. cause it uses an in-memory data structure with a
low overhead.
7.2.1. Rule-based tamper detection Accorsi (2011) implements this type of hash

U
Métayer et al. (2010) and Mazza et al. (2010) chaining method in a distributed architecture and
propose a framework that defines formal criteria of uses the public key cryptography to ensure only au-
AN
825

790 event log architectures that consider security and thorized machines can send log entries to the de-
can detect any malicious modifications. The formal signed server. This method provides a complete
rules for log correctness and consistency are mod- package of log security specifically integrity, tam-
eled based on the B-method (Abrial, 1996). How- per detection, and retrieval of the log entries. The
M

ever, the proposed architecture cannot handle in- 830 disadvantage of the cryptographic hash is when the
795 correct log entries that may exist (Mazza et al., attackers can successfully access the machine, the
2010). In other words, Mazza et al. (2010) assumes integrity of event logs will not be guaranteed any-
D

that log entries are always correct when appended more since they will break the log security architec-
to a log file for the first time. The modification ture.
detection is conducted after all log entries saved.
TE

800 Cho (2013) presents a method for investigators to 835 7.2.3. Hardware-based tamper detection
detect timestamp modification specifically in Win- Detection of alteration can also be performed
dows file systems. The paper provides a detailed in hardware-based secure storage for event logs
structure of a Windows journal file that saves the (Boeck et al., 2010; Borhan et al., 2012). For in-
EP

operation sequences on the file system. The pro- stance, AMD processor provides a feature called
805 posed method then analyzes several types of times- 840 Secure Virtual Machine (SVM) Trusted Platform
tamp tampering and creates the rules to detect the Module (TPM) that can run a special protected
forgery in various files. This work is the first use of code (Boeck et al., 2010). It also offers Secure
C

the NTFS journaling system for forensic purposes. Loader Block (SLB) to work with Legitimate Log-
The only limitation is when defining the rules for ging Client (LLC). This hardware technology pro-
AC

810 alteration attempts. If the rules have not defined, 845 tects the integrity of the OS logs and prevents tam-
the modification cannot be detected. pering operations.
As outlined in Section 5.1, software-based tech-
7.2.2. Tamper detection using cryptographic hashes nique generates overhead as each log entry is sent
As an integral part of event log security, the cryp- to the server. In log tamper detection using soft-
tographic hash approach can be used to detect the 850 ware, further research is needed for evaluating the
815 modification of an event log. If a single log en- sending mechanism. There is a possibility to send
try is removed or tampered with, it will break the some log entries in a period, not one-by-one, to the
hash value chain because each log entry contributes server to make sending many log entries faster. On
to the generation of the hash code (Schneier and the other hand, hardware-based approach gives an-
Kelsey, 1999; Etoh et al., 2010). The advantage of 855 other insight into event log security compared with
12
ACCEPTED MANUSCRIPT

those that only use software. However, the use of 7.3.2. Event correlation based on database
hardware-supported techniques cannot prevent im- Since there can be multiple sources of logs in-
personation attacks as discussed in (Boeck et al., cluding OS logs, an investigator should unify those
2010; Borhan et al., 2012). sources, build a database, and run queries to corre-
910 late the events (Chen et al., 2003). Although it
860 7.3. Event correlation and reconstruction gives flexibility through the query interface, this
Correlation and reconstruction of the events are method requires that the forensic investigator man-

PT
closely related topics and accordingly are consid- ually configure attack patterns in the database.
ered together here. Investigators need to correlate By using a database, the investigator can run an
two events, possibly from separate log files, in order 915 event correlation automatically (Marrington et al.,
865 to reconstruct an attack. The classification of these 2007). One should normalize the Windows event

RI
topics are: 1) rule-based; 2) database; 3) semantic logs before inserting them into the database. Next,
model; 4) tree or graph-based; 5) timestamp-based; the investigator is able to discover and correlate
6) finite state machines; 7) virtual machines; and the event using an SQL query. Despite its auto-

SC
8) live event reconstruction. Each category is de- 920 matic behavior, this approach does not consider the
870 scribed below and a summary is given in Table 3. timezone in timestamp analysis (Marrington et al.,
2007). In some cases, this will lead to inaccurate
7.3.1. Rule-based correlation and reconstruction event correlation. To address this issue, the forensic

U
Simple Event Correlator (SEC) provides investigators have to take a note both of the time of
lightweight and platform-independent event cor- 925 examination and the timezone of investigated ma-
relation (Vaarandi, 2002a). It implements a chine (Boyd and Forster, 2004). These timestamps
AN
875 rule-based method to identify possible attacks on need to be synchronized later in the report.
the system from event logs. However, SEC needs
many different rules for other application logs, so 7.3.3. Event correlation using semantic model
the rules should be manually defined. Following Schatz et al. (2004) found that the investigator
M

SEC, Abbott et al. (2006) provide automatic 930 can detect a sequence of the events automatically
880 recognition of events by creating logical matching and semantically. This method constructs a se-
pattern saved in an XML file. Similar to SEC, this mantic domain model for OS logs based on web
D

technique depends on the pattern specifications ontology language. The authors combine the rule-
saved in the database. based method with semantic representation that
To deal with massive log files, Herrerı́as and can provide contextual events for correlation. Am-
TE

935

885 Gomez (2007) create a formal model for the events. ato et al. (2017) support this idea by extracting the
The method uses this formal model in the cor- digital evidence and representing it as a semantic
relation engine based on pre-condition and post- data model. The analysis utilizes reasoning and
condition rules. Although the paper offers auto- the queries based on a semantic methodology. The
EP

mated solutions, no experiment is presented in the 940 semantic approach enables more expressive way of
890 paper. Furthermore, to reconstruct the events au- representing events in terms of subject-predicate-
tomatically, Herrerı́as and Gómez (2010) use prede- object form and search-ability (Amato et al., 2017).
fined attack base rules and correlate events based Although these approaches offer automated corre-
C

on the log properties. The author suggests further lation, they do not include standardized ontology
improvement for adding more rules in the knowl- 945 components (Schatz et al., 2004). Moreover, Am-
AC

895 edge base. ato et al. (2017) do not provide any experimental
In addition to the previous approaches, the in- results.
vestigator can run the correlation and filtering of
various logs including OS logs (Forte, 2004). In 7.3.4. Tree or graph-based event correlation and re-
this case, the filtering is related to extract and ar- construction
900 range event logs based on particular fields such as 950 Wang and Daniels (2005) introduce a reasoning
by protocol or IP address. We can then use a top- framework to analyze event logs. They build a
down or bottom-up approach to correlate events. graph-based structure for event logs, and employ
To deal with large log files, this method requires local and global reasoning to correlate events. They
some adaptation to distributed system platforms also provide a method to reconstruct the attack sce-
905 (Forte, 2004). 955 nario from various event logs, including OS logs, to
13
ACCEPTED MANUSCRIPT

Table 3: A summary of key publications in OS log correlation and reconstruction


Method Publication Advantage Disadvantage
Rule-based Vaarandi (2002a) Lightweight and Need many rules for other applications
platform-independent
Abbott et al. (2006) Automatic recognition of events Depend on the pattern specification in
database
Database Chen et al. (2003) Provide flexibility with query Manual recognition of pattern of attack
Marrington et al. (2007) Automatic correlation Not consider time zone in timestamp

PT
analysis
Semantic model Schatz et al. (2004) Automatic detection of event Not include standarized ontology
sequence components
Amato et al. (2017) The latest ontology approach Not supply experimental results
Tree or graph Wang and Daniels (2006) Automatic reasoning Focus on network packet and event logs

RI
as secondary
Arasteh et al. (2007) Tree-based formal model Need more efficient model checking for
proofing part
Timestamp- Gómez et al. (2005) Handle multiple logs from It is assumed that device and logs are

SC
based different devices not modified
Schatz et al. (2006) Manage various logs Assumed that time always synchronized
Finite state Gladyshev and Patel Automatic reconstruction Extend to general purposes event
machines (2004) reconstruction
Virtual machines Årnes et al. (2006) Snapshot of VM can be saved for In some aspects, it is slower than actual

U
later analysis machine
Live Olajide et al. (2009) Find the root cause of an Can be extended to other operating
reconstruction incident in real time systems
AN
support network forensic investigation (Wang and 7.3.5. Timestamp-based event reconstruction
Daniels, 2006). The model of event logs is graph
985 A timestamp can be extracted from a log entry
spectral and then the procedure extracts the attack
M

and can be an important factor for reconstruct-


scenario based on the suspicious graph structure. In
ing events. In the case of consolidating event logs
960 addition, it also supports large scale event logs. A
from various sources, Gómez et al. (2005) use Lam-
deeper investigation for event logs is need in Wang
port’s logical clock to model events from times-
D

and Daniels’ method since the OS logs act as a sec-


990 tamps found in different log sources. Meanwhile,
ondary object and the work primarily focuses on
Schatz et al. (2006) run an empirical study to ob-
the network traffic.
TE

serve temporal behavior from Windows-based do-


main controller logs and other sources like user’s
browser logs. Both methods (Gómez et al., 2005;
965 To accommodate multiple sources of log files in-
995 Schatz et al., 2006) are able to handle multiple logs
cluding OS logs, Arasteh et al. (2007) propose a
EP

from different devices. However, it is assumed that


tree-based data structure and analyze correlation
device and logs are not modified (Gómez et al.,
using algebraic terms. Another formal and uni-
2005) and the time is always synchronized (Schatz
fied verification model for event logs is presented
et al., 2006).
by Saleh et al. (2007). The event logs are mod-
C

970

eled based on logic for electronic commerce proto- 1000 Moreover, timestamps as a computer history
col called ADM logic and use a tree data struc- model can be extracted from various event logs to
AC

ture to query the properties. The authors chose assist forensic analysis in event reconstruction (Car-
ADM logic because it accommodates a temporal, rier and Spafford, 2004). An unique approach was
975 dynamic, modal and linear logic. Saleh et al. (2007) presented by Koen and Olivier (2008). When log
also model event logs as a tree, analyze using alge- 1005 files are deleted by the attacker, the forensic investi-
braic logic, and demonstrate the model implemen- gator can extract file timestamps to create relation-
tation to Windows event logs. A tableau-based sys- ship between events. Despite being automatic, this
tem (Cleaveland, 1990) is used for event log verifica- technique assumes that the file timestamp is not
980 tion. However, both methods (Arasteh et al., 2007; modified by the application. This assumption be-
Saleh et al., 2007) require efficient checking for the 1010 comes a limitation because the attacker may modify
proofing part of the mathematical model to speed the timestamp in a particular file. Zhu et al. (2009)
up the analysis. show that the reconstruction of an event can be per-
14
ACCEPTED MANUSCRIPT

formed based on the state of the operating system 1065 7.3.6. Event reconstruction based on finite state
as the state is saved based on a particular times- machines
1015 tamp. For instance, in Windows system state, a A proposal for reconstructing the event automat-
timeline is built by extracting and comparing the ically is presented in Gladyshev and Patel (2004).
sequence of events in the saved system state. The The authors use a finite state machine (FSM) to
disadvantage of this approach is it demands mini- 1070 model and reconstruct events by backtracking tran-
mum one snapshot to be compared. In addition, an sitions in the FSM. FSM-based reconstruction is an

PT
1020 event may be removed between two snapshots so it automatic procedure and it can be extended to as-
cannot be investigated after an incident occurred. sist general purpose event reconstruction.
A high-level event is one that human can under- This work is improved in James et al. (2009).
stand such as “Connection from a USB stick”. In The authors improve the FSM for event reconstruc-

RI
1075

event log files, such high-level events can be made tion by converting the FSM model into a determin-
1025 up of many low-level events recorded as log en- istic finite automaton (DFA). This improvement
tries. To obtain a high-level event reconstruction, solves the limitation of FSM’s high load computa-

SC
the investigator can run an automatic analysis as tion when backtracking each state of incident sce-
described in Hargreaves and Patterson (2012). The 1080 nario. Although this method only supports simple
proposed procedure extracts low-level timeline from cases such as an investigation of network printer
1030 various sources including Windows event logs and logs as described in Gladyshev and Patel (2004),

U
then reconstructs high-level events using matching this protocol can be implemented in OS logs.
rules. Despite its automated fashion, this proce-
dure only provides a limited timestamp extractor 7.3.7. Event reconstruction using virtual machines
AN
for some types of event logs such as browser logs 1085 Another approach is to use the benefits of the vir-
1035 and Windows XP logs. Additionally, there is a tual machines for event reconstruction. For exam-
time overhead when processing each log entry. To ple, we can use a virtual testbed to reconstruct an
reconstruct an event in a falsified logs, Tang and attack where the main source is Linux logs (Årnes
M

Fidge (2010) propose two algorithms, specifically et al., 2006). First, we need to build a virtualization
A* search and a heuristic method. These algo- 1090 architecture, replay the attack, clone the images,
1040 rithms quantify the steps to convert the predicted and analyze the reconstruction. An improvement
D

attacker’s events to actual events found in the fal- of this method is presented in Årnes et al. (2007)
sified OS logs. They report a drawback that the that involves building a testbed in a virtual ma-
algorithm does not support large log files. chine and then reconstructing the event based on
TE

To cope with cross-drive devices, Patterson and 1095 the replayed attacks.
1045 Hargreaves (2012) propose an automatic method The benefit of virtual machines is that the snap-
for timeline reconstruction. Similar to Hargreaves shot of the virtual machine can be saved for later
and Patterson (2012), this method reconstructs analysis (Årnes et al., 2006). While in Årnes et al.
EP

high-level events from various devices. However, it (2007), the virtual testbed supplements the event
requires a more complex example in the experiment 1100 reconstruction hypothesis to simulate different at-
1050 to prove the robustness of the method (Patterson tacks. However, the virtual machine approach is
and Hargreaves, 2012). slower than an actual machine in some aspects such
C

As timestamp-based methods are a dense re- as booting, rebooting, and the creation of a foren-
search area, we provide a typical model for this sic image. Additionally, some attacks cannot be
AC

approach in Fig. 5. First, the timestamps and 1105 replayed in the virtual environment as they have a
1055 event messages are extracted from various event feature to detect a virtual machine.
logs. Second, these logs can be correlated based on
a specific method or from attack rules repository. 7.3.8. Live event reconstruction
The event logs can be from various sources espe- Almost all of the aforementioned methods deal
cially OS logs (Gómez et al., 2005; Schatz et al., with event logs after the attacks had occurred. This
1060 2006), multiple computers (Hargreaves and Pat- 1110 approach is called as post-mortem analysis. To re-
terson, 2012; Patterson and Hargreaves, 2012), or construct an event in real time, Olajide et al. (2009)
Windows states (Zhu et al., 2009). The investigator introduce automatic live event reconstruction to
can check a reconstructed event and analyzes some support forensic investigation. This method corre-
events of interest found via the timeline viewer. lates a live memory analysis with various machine
15
ACCEPTED MANUSCRIPT

Extract timestamps and events

OS logs
Correlate and reconstruct  Attack rules 
timestamp from: repository

PT
Various  Multiple  Several Windows 

RI
applications computers states

Event of interest or 
Timeline viewer

SC
malicious events

Figure 5: A typical model for event correlation and reconstruction with timestamp-based approach

U
1115 logs such as shell history and Linux logs. Further- In a Windows environment, the investigator can
more, live reconstruction enables us to find the root generate user profiles using time windows extracted
AN
cause of an incident in real time. Although the ex- from event logs and create event abstraction (Cor-
periment was only performed in Linux environment, ney et al., 2011). Event abstraction involves group-
it could be extended to other popular platform such 1150 ing of similar log entries. The anomaly is defined
1120 as Windows. as a different event from the trained normal user
M

profile. The possible improvement of this approach


7.4. Anomaly detection is to get better grouping of application processes
Anomaly detection is about identifying irregu- to increase the accuracy of user profiling (Corney
et al., 2011).
D

larities or suspicious activity in event logs. These 1155

irregularities can be further examined by the in- Schindler (2017) uses two Support Vector Ma-
vestigator. We group this topic into several cate- chines (SVM) to detect an anomaly in log data.
TE

1125

gories: 1) user profiling and machine learning; 2) The first SVM is for separating multiple prede-
timestamp-based; and 3) event log clustering. A fined classes from each other. The feature vec-
summary of key publications in anomaly detection 1160 tors are based on Windows logon, logoff, and fire-
for event log forensic investigation is given in Table wall logs. The second SVM is one-class SVM
EP

1130 4. to classify previously discovered classes to nor-


mal and anomaly events. Moreover, Hu et al.
7.4.1. Anomaly detection based on user profiling (2017) proposed anomaly detection based on user
and machine learning 1165 activity. The method first tackles various formats
C

A normal activity profile of a particular user can from multi-source logs including OS logs by cre-
be used to discover anomaly. In 2002, Abraham and ating a metadata extraction to normalize the log
AC

1135 De Vel (2002) identify irregularities based on user entries. After that, the method constructs user-
profiling from OS logs. The proposed method ap- specific models to issue alerts for users whose event
plies association rules to detect an unusual event. 1170 patterns are not similar to their patterns in the
The deviations will be found if the profiling from training phase.
raw event logs is different to the standard pro- To give an illustration of profiling for anomaly
1140 file generated previously. In another study, Abra- detection, we provide a typical model for this ap-
ham et al. (2002) use attribute-oriented induction proach in Fig. 6. Event logs are first preprocessed
to generate a profile and then separate the outliers. 1175 to make them ready for analysis. A rule mining,
The attribute-oriented induction represents event such as association rules (Abraham and De Vel,
logs as a hierarchy and generalizes the attributes in 2002) or attribute-oriented induction (Abraham
1145 a database by processing the hierarchy tree. et al., 2002), is used to get base user profile. Addi-
16
ACCEPTED MANUSCRIPT

Table 4: A summary of key publications in anomaly detection in event log


Method Publication Advantage Disadvantage
User profiling & Abraham and De Vel Build foundation for log Cannot handle multiple event logs
machine learning (2002) anomaly detection for forensic
purposes
Corney et al. (2011) Framework for anomaly Performance can be increased by
detection improving the grouping steps
Timestamp-based Marrington et al. (2009, Automatic analysis Detect inconsistencies without

PT
2011) providing correction
Thorpe and Ray (2012) Automatic detection Cannot detect user sessions in virtual
machine
Event log Vaarandi (2003) A lightweight clustering Many parameters required to be set
clustering method

RI
Studiawan et al. (2017) Automatic clustering and Do not consider semantic relationship
anomaly detection between events

SC
tionally, a machine learning approach such as SVM Each word in a log entry corresponds to an item-
1180 (Schindler, 2017) is run in this step to generate 1215 set and SLCT then analyzes using frequent itemset
base profiles for existing users. The investigation mining to find the most common appearing words.
includes filtering profile, intra-profile analysis, and SLCT also employs density-based clustering, which

U
comparison of input log entries to the discovered identifies a dense region or group of similar words.
base profiles. This process will generate a report Afterward, SLCT defines an anomaly as the log en-
containing anomalies within the specific time pe- tries that do not fit well to any of the clusters found.
AN
1185 1220

riod. Although SLCT is not directly designed for foren-


sic purposes, it builds a foundation of cluster-based
7.4.2. Timestamp-based anomaly detection anomaly detection for event logs.
M

In terms of temporal inconsistency, Marrington Yen et al. (2013) created Beehive, a method that
et al. (2009, 2011) examine this property in event 1225 mining and extracting knowledge from log files au-
1190 logs. The timestamp in the Windows logs is mod- tomatically. Additionally, event logs considered are
eled using the Lamport relation and the method de- Windows logs and security application logs. Bee-
D

tects both out of sequence and missing events. One hive extracts 15 features from log files and catego-
significance advantage of this technique is it can run rizes them into four groups, specifically destination-
TE

in an automated fashion. However, it detects in- 1230 based, host-based, policy-based, and traffic-based.
1195 consistencies without providing a recommendation Beehive detects suspicious host behaviors using a
of correction in the event logs. This work can be ex- custom K-means clustering without specifying the
panded to a non-Windows environment as it shows number of clusters in advance. An anomaly is de-
EP

robustness in anomaly detection. fined as a cluster that deviates significantly from


The timestamp-based approach is also deployed 1235 others.
1200 in a virtual machine environment (Thorpe and Ray, A recent technique detects anomalies automati-
2012). With a similar Lamport model, the times- cally to support forensic investigations by cluster-
C

tamp is used to detect out of sequence or missing ing (Studiawan et al., 2017). The event log is rep-
events in virtual OS logs. Unfortunately, this ap- resented as a graph. The method then builds a
AC

proach cannot detect user sessions in a virtual ma- 1240 parameter-free graph-based cluster and develops a
1205 chine. As discussed above, we can also generalize statistical score to detect anomalies automatically.
to different Windows platforms and analyze event In spite of offering an automatic approach, this
logs from various installed applications. method does not include the semantic relationship
between event messages as it only considers word
7.4.3. Anomaly detection based on event log clus- 1245 frequency. Furthermore, the proposed method will
tering run slowly for large event logs.
1210 Clustering-based event log analysis is a com- To get the real-time analysis, there is an approach
monly used approach. Vaarandi (2003) initiated that uses incremental clustering to detect anomalies
this research domain by proposing a technique, in Linux Ubuntu logs and bug-tracking application
namely the Simple Log Clustering Tool (SLCT). 1250 logs (Wurzenberger et al., 2017). This method of-
17
ACCEPTED MANUSCRIPT

OS logs Event log  Rule mining or Base profiles


pre­processing machine learning

­ Filtering
New log entries Anomaly 
­ Intra­profile analysis
report

PT
­ Compare input logs
  with base profiles

Figure 6: A typical model for OS log anomaly with profiling and machine learning approach

RI
fers fast computation as it separates the training 8. Visualization of OS logs
and testing processes. Therefore, there is no need

SC
to recalculate the cluster every time a new log entry The next step after the main investigation is to
is created. One limitation found in this real-time present the results of an analysis. The most com-
1255 approach is the transformation of log entry to the mon technique to support a presentation of event
Euclidean space can be sped up by changing the 1290 logs is using visualization. There are four common

U
granularity from character-based to word-based. types of visualization for event logs data: 1) foren-
sic timeline; 2) tree-based; and 3) graph-based.
AN
8.1. Forensic timeline
7.5. Event log abstraction Since almost all OS logs contain a timestamp,
1295 we can create a visualization based on this infor-
mation to view a timeline of the event. The pro-
M

Abstraction or signature of event logs is a tem- posed tool by Olsson and Boldt (2009), named Cy-
1260 plate containing words and wild-cards represent- berForensics TimeLab, provides a scanner to read
ing all members in a group of event log entries the source data to produce a chronological order of
(Makanju et al., 2012). The event log signature
D

1300 events. The main advantage of a timeline visualiza-


provides an abstraction so that the forensic investi- tion is that it uses an automatic approach, so there
gator will get a general insight or summary of event is no user intervention. A caching mechanism is
TE

1265 logs. used to handle large data to increase performance.


The early works in event log abstraction are not The only limitation is that the pattern search is yet
explicitly intended to be applied to forensic inves- 1305 to be automated.
tigation (Vaarandi, 2003). However, there are two Son and Lee (2011) recommended building foren-
EP

papers discussing deep learning for event log ab- sic timeline based on user behavior. The user ac-
1270 straction in forensic analysis. Thaler et al. (2017a) tivities and respective time information are gener-
find a log signature using a supervised method with ated from various sources such as Windows event
the combination of a feed forward neural network logs, SetupAPI.log (containing USB device logs),
C

1310
and long short-term memory (LSTM). Afterward, browser logs, and instant messenger logs. The vi-
a subsequent work by the same authors upgrade sualization displays input event logs based on the
AC

1275 the method to unsupervised approach using autoen- timeline. The proposed tool uses an intuitive dis-
coders with LSTM cells. play as date and time are shown in x-axis and y-
There is a disadvantage in Thaler et al. (2017a) 1315 axis, respectively. In addition, each extracted event
where the experiments were conducted on relatively has unique color for identification. Using this ap-
small datasets and it needs large event logs to proach, it is possible to examine user behavior and
1280 demonstrate the generalization of the method. An- when it has occurred.
other limitation found in Thaler et al. (2017b) is
that the proposed method requires a longer run- 8.2. Tree-based log visualization
time to train the model compared to the non-deep 1320 Buchholz and Falk (2005) create a timeline edi-
learning methods (Vaarandi, 2003; Makanju et al., tor from various event logs. Unlike the timeline ap-
1285 2012). proach (Olsson and Boldt, 2009), the visualization
18
ACCEPTED MANUSCRIPT

is based on the tree data structure to accommo- 9. Post-process of OS log investigation


date events. The tool was called Zeitline (Falk and
1325 Buchholz, 2006) and it imports the events from var- This phase deals with the evaluation of investi-
ious sources. Zeitline uses tree-based visualization gation process. The forensic investigator needs to
and query interface to assist the investigator. The review the model or framework used in the exami-
main benefit of Zeitline is it can provide both gen- 1375 nation. There are many frameworks available and
eral and detail event visualization based on a tree the investigator can choose which one is appropri-

PT
1330 hierarchy. However, Zeitline will run slowly when ate for our particular case. Do et al. (2014) propose
displaying large event logs due to GUI rendering in a framework for Windows event log forensics. This
Java (Buchholz and Falk, 2005). Moreover, it needs framework contains the main step of forensic inves-
manual intervention from the forensic investigator 1380 tigation specifically for Windows event logs such as

RI
when adding a particular timestamp. identification, preservation, analysis, and presenta-
tion. However, the framework needs to be extended
1335 On the other hand, there is a framework for time-
to newest versions of Windows. Futhermore, there
line visualization (Inglot et al., 2012; Inglot and Liu,
are frameworks for live analysis on Linux environ-

SC
2014). The author reviews the Zeitline tool (Buch-
1385 ment as presented in Choi et al. (2008); Lee et al.
holz and Falk, 2005), identifies its disadvantages,
(2010). These models offer a more complete ap-
and implements an improvement in user interface
proach from acquiring the evidence to the investi-
1340 aspects for timeline visualization.
gation reports.

U
Like other digital evidence processes, the pro-
1390 cedure for examining event logs follows certain
AN
8.3. Graph-based log visualization phases. In other words, a general digital forensic
investigation framework can be applied to different
Another approach to simplify event log analysis is types of investigation including event logs (Carrier
visualization via graphs (Schmerl et al., 2010). This and Spafford, 2006). For instance, one can build a
M

method builds virtual audit data spaces and con- 1395 computer history model from various event logs to
1345 structs interactive 3D visualization based on quan- categorize forensic analysis in event reconstruction
titative analysis of event interrelations. In our re- (Carrier and Spafford, 2006).
Moreover, another approach is to perform a
D

cent work, we also employ graph-based visualiza-


tion to present anomaly detection in event logs multi-tier investigation framework (Beebe and
(Studiawan et al., 2017). Graph-based visualization 1400 Clark, 2005). This hierarchical structure enables
TE

1350 offers a big picture of anomaly and non-anomaly the investigator to see both the global and detailed
event logs per cluster. However, it is unable to pro- processes for each phase of the examination. To get
vide visualization in chronological order. the global structure, the framework can collapse the
A new and most updated tool to visualize lower hierarchies and expand it again to get more
EP

event log timeline is Timesketch (Google, 2018). 1405 details. Finally, a recent review of various inves-
1355 It is a forensic tool from Google that provides tigation frameworks is presented in Agarwal and
timeline analysis for forensic purposes. The main Kothari (2015).
advantage of Timesketch is that it enables forensic
C

investigators to collaborate to manage and examine 10. Tools for OS log forensics
event logs at the same time. The supported inputs
AC

1360 are logs that formatted in CSV (Comma-separated There are many tools and software for event log
Value) or JSON (JavaScript Object Notation) 1410 forensics. We classify these tools into two cate-
and logs that extracted from the log2timeline tool gories: 1) general tools and 2) libraries. We de-
(Metz, 2018c). Timesketch supports both tabular scribe each category as below.
and graph-based view for analyzed logs. The
1365 example of graph view of Timesketch is shown 10.1. General tools
in Fig. 7. In this visualization, an investigator As event logs are the part of the bigger digital evi-
examines all users who logged in to a Windows 1415 dence, specifically hard disk, the acquisition process
operating system when an application named can be conducted by general forensic tools. The
GROOVE.EXE was started. three most popular general-purpose tools are En-
1370 Case Forensic (OpenText, 2018), Forensic Toolkit
19
ACCEPTED MANUSCRIPT

PT
RI
U SC
AN
Figure 7: Graph-based visualization of Windows event logs from Timesketch (Berggren, 2017)
M

(FTK) (AccessData, 2018), and Autopsy (Basis Windows Powershell to execute the agent modules
1420 Technology, 2018). The first two tools are commer- from multiple machines in a network (Hull, 2018).
cially supported, while the later is an open source This tool collects data including event logs for fur-
project supported by the digital forensic commu- ther analysis. For event logs residing in a foren-
D

nity. 1450 sic memory images, Volatility tool (The Volatility


In case of event log forensics, some researchers Foundation, 2018) provides a plugin called evtlogs
TE

1425 share their code with the community. For example, (Levy, 2013). This plugin is developed by Jamie
SEC (Simple Event Correlator), as discussed in Sec- Levy and it can extract event logs from Windows
tion 7.3, is a tool for advanced event processing for XP and 2003 memory images.
event log monitoring, event log forensics, or any 1455 EVTXtract (Ballenthin, 2018a) and FixEvt
EP

other task involving event correlation (Vaarandi, (Murphey, 2007a) can recover and repair broken
1430 2002a,b). Another tool by Gladyshev and Patel Windows event logs due to unexpected crash. Also,
(2004), EARL (Event Analysis and Reconstruction Microsoft has an official event log parser called Log-
in Lisp) performs event reconstruction based on Parser (Microsoft, 2005). This tool enables the
C

the finite state machine. In addition, Timesketch 1460 forensic investigator to access the event logs and
(Google, 2018) and CyberForensics TimeLab (Ols- the registry. Other advantages of LogParser are
AC

1435 son, 2012) have been discussed in Section 8.1. A that it can correlate events and supports network
complete list of event log forensics tool reviewed traffic analysis. LogParser is also able to examine
here is shown in Table 5. Snort IDS logs and to provide SQL query feature
PyFlag is a tool to simplify the process of log file 1465 for retrieving particular events.
analysis and forensic investigations (Collett and Co- A set of tools are provided by TZWorks for Win-
1440 hen, 2008). PowerForensics is a tool that provides a dows event log analysis namely evtx view (TZ-
framework for hard drive forensic analysis including Works, 2018c), evtwalk (TZWorks, 2018b), and
stored event logs (Barakat and Hadi, 2016; Atkin- elmo (TZWorks, 2018a). evtwalk is a log parser
son, 2018). PSRecon acquires data including event 1470 for all versions of Windows. evtx view provides a
logs from a remote Windows machine and delivers report for a particular category of events such as
1445 this data to the server (Foss, 2017). Kansa also uses credential changes or USB connections. The parser
20
ACCEPTED MANUSCRIPT

Table 5: List of event log forensics tools


Tool Event log Language License
SEC (Vaarandi, 2002a,b) Custom Perl GNU GPL
EARL (Gladyshev and Patel, 2004; Custom Lisp GNU GPL
Gladyshev, 2006)
Timesketch (Google, 2018) Custom Python and TypeScript Apache
CyberForensics TimeLab (Olsson, 2012) Custom C# and Perl n/a
PyFlag (Collett and Cohen, 2008) Custom Python GNU GPL

PT
PowerForensics (Barakat and Hadi, Windows log C# and PowerShell MIT
2016; Atkinson, 2018)
PSRecon (Foss, 2017) Windows log PowerShell Apache
Kansa (Hull, 2018) Windows log PowerShell Apache

RI
Volatility evtlogs (Levy, 2013) Windows log Python GNU GPL
EVTXtract (Ballenthin, 2018a) Windows log Python Apache
FixEvt (Murphey, 2007a,b) Windows log n/a n/a
LogParser (Microsoft, 2005) Windows log n/a n/a

SC
evtx view (TZWorks, 2018c) Windows log C++ Proprietary
evtwalk (TZWorks, 2018b) Windows log C++ Proprietary
elmo (TZWorks, 2018a) Windows log C++ Proprietary
log2timeline (Metz, 2018c) Custom Python Apache
Event2Timeline (Chopitea, 2014) Windows log Python and JavaScript GNU GPL

U
AuditParser (Kazanciyan, 2013) Windows log Python Apache with commercial support
Splunk (Splunk Inc, 2018) Custom Python, Java, C#, Ruby, Apache with commercial support
PHP, and JavaScript
AN
GFI EventsManager (GFI Software, Custom n/a Proprietary and commercial
2018) support
ELM Enterprise Manager (TNT Windows log n/a Proprietary and commercial
Software, 2018) support
Assuria Log Manager (Assuria, 2018) Custom n/a Proprietary and commercial
M

support
Event Log Explorer (FSPro Labs, 2018) Windows log n/a Proprietary and commercial
support
EventLog Analyzer (ManageEngine, Custom n/a Proprietary and commercial
2018) support
D

Elasticsearch, Logstash, and Kibana Custom Ruby, Java, and Apache with commercial support
(ELK) (Elasticsearch, 2018) JavaScript
TE

engine for evtwalk and evtx view is the same. Fur- ular company with commercial support. In Ta-
thermore, elmo converts the raw event logs to be ble 5, only Elasticsearch, Logstash, Kibana stack
1475 inserted into SQLite database. From the database, 1495 (ELK) (Elasticsearch, 2018) provides the source
EP

the forensic investigator can correlate and recon- code, while Splunk (Splunk Inc, 2018) opens some
struct the events. The input for elmo can be a live parts of the code’s tools. Other tools use propri-
Windows machine or a forensic image. etary licenses and do not disclose the code. These
log2timeline provides a feature to extract times- tools include GFI EventsManager (GFI Software,
C

1480 tamps from event log files from an operating sys- 1500 2018), ELM Enterprise Manager (TNT Software,
tem and visualize them (Metz, 2018c). Similar with 2018), Assuria ALM-SIEM Assuria (2018), Event
AC

log2timeline, Event2Timeline also provides visual- Log Explorer (FSPro Labs, 2018), and EventLog
ization for Windows event logs (Chopitea, 2014). Analyzer (ManageEngine, 2018). In some cases, the
However, Event2Timeline needs an external parser forensic investigator may choose open source alter-
1485 like Microsoft LogParser to preprocess the data. A 1505 natives that can provide comparable performance
cyber security company called Mandiant provides in event log examination. For example, Logstash
the AuditParser tool (Kazanciyan, 2013). Audit- has powerful event filtering and conversion capabil-
Parser is used to convert the XML output file from ities (Vaarandi and Niziński, 2013).
other Mandiant tools into tab-delimited text files.
1490 These files contain many different types of evidence 10.2. Libraries
including event logs and other Windows artifacts. 1510 The libraries for event log forensics are mainly for
The remaining tools are provided by a partic- Windows logs. Parse-Evtx (Schuster, 2007, 2011)
21
ACCEPTED MANUSCRIPT

and python-evtx (Ballenthin, 2018b) are Windows 1560 M57-Patents case (Garfinkel, 2009a). Although
event log parsers written in Perl and Python, re- these cases are fictional, the attacks are real and
spectively. Furthermore, the library for the legacy need to be investigated properly as if they are real
1515 version of Windows event log format (.evt) and new cases. There are solution manuals for these two
version based on XML structure (.evtx) is provided scenarios for accredited lecturers and researchers.
by Metz (2018a) and libevtx (Metz, 2018b), respec- 1565 The M57-Jean case (Garfinkel, 2008) is about file
tively. leakage in a startup company M57.Biz. The file

PT
These libraries have the main advantage that the contains the salaries and social security numbers of
1520 investigator can focus on the content of event logs as all staff. The forensic investigator can examine the
the parsing step is performed by the libraries. The disk image of one of the staff’s laptop and explain
choice of the library can be based on the program- 1570 how the file was leaked. While the main focus in

RI
ming language preferred by the investigator. One this scenario is a chat log, the investigation can cor-
limitation of these libraries is that none of them is relate it with Windows event logs.
1525 dedicated to event logs on Linux-based operating The next case, M57-Patents (Garfinkel, 2009a),

SC
systems. gives the researcher a challenge to solve three types
1575 of criminal activities involving illegal files, propri-
etary research exfiltration, and company eavesdrop-
11. Public datasets for OS log forensics ping, with the latter being most relevant to event

U
log forensics. In this scenario, one of the staff in-
The research community needs publicly available
stalled a keylogger on the CEO’s computer. The
datasets to support experiments and benchmark re-
investigator needs to find out the identity of that
AN
1580
1530 sults for event log forensics research. There are a
person and how they performed the eavesdropping.
few public datasets namely Digital Corpora, Digital
There is also a disk image data namely nps-2009-
Forensic Research Workshop (DFRWS) Challenge,
casper-rw (Garfinkel, 2009b). This image is an ext3
Computer Forensic Reference Data Sets (CFReDS)
file system dump from a bootable USB. The user
M

from NIST, the Honeynet Project, and SecRepo.


1585 of this disk image browses several US Government
1535 Although these datasets contain various types of
websites. This case can provide event logs, espe-
digital evidence, such as picture files, documents,
cially browser logs and various logs from a Linux
and memory dump, we focus on datasets that
D

system, to be analyzed forensically.


have event logs as the main object of investigation.
These datasets are summarized in Table 6. 11.2. Digital Forensic Research Workshop
TE

1540 As shown in Table 6, the datasets have various 1590 (DFRWS) Challenge
ages from 2001 to 2018. The old datasets can be
One of the most notable conferences in the dig-
used for educational purposes as the forensic case
ital forensics research area is the Digital Foren-
and techniques are evolving year by year. To deal
sic Research Workshop (DFRWS). This conference
EP

with new environments such as OS logs on the In-


not only provides a place to present research re-
1545 ternet of Things (IoT), the researchers can use the
1595 sults, but also issues an annual forensic challenge.
new datasets from DFRWS 2018.
The challenges associated with event log foren-
sics are DFRWS Forensic Challenge 2008 (Geiger
C

11.1. Digital Corpora et al., 2008), 2009 (Casey and Richard III, 2009),
Garfinkel et al. (2009) argue that digital foren- and 2017-2018 (James, 2018). For each challenge,
AC

sics need a standardized dataset or corpus to 1600 DFRWS has publicly posted the submitted solu-
1550 make the research reproducible. The pro- tions.
posed dataset by Garfinkel (2018) is hosted in The main case of the DFRWS Forensic Chal-
http://digitalcorpora.org/. There are some types lenge 2008 (Geiger et al., 2008) is about investi-
of data such as cell phone dumps, disk images, files, gating unauthorized access to the company propri-
and network traffic dumps. In addition, they also 1605 etary information. We need to deal with a kernel
1555 provide some security incident scenarios so that re- log or browsing log to reconstruct the event time-
searchers can investigate this case study using var- line. Meanwhile, in DFRWS Forensic Challenge
ious methods and tools. 2009 (Casey and Richard III, 2009), the researcher
The scenarios most closely related to event log has to examine authentication logs to trace the at-
forensics are M57-Jean case (Garfinkel, 2008) and 1610 tacker. Other log files included are login and logout
22
ACCEPTED MANUSCRIPT

Table 6: OS logs from public forensic case studies and datasets


Source Case study or dataset Event log Year
Digital Corpora M57-Jean (Garfinkel, 2008) Windows logs 2008
M57-Patents (Garfinkel, 2009a) Windows logs 2009
nps-2009-casper-rw (Garfinkel, 2009b) Linux logs 2009
Digital Forensic Research Workshop DFRWS Forensic Challenge 2008 (Geiger Linux logs 2008
(DFRWS) et al., 2008)
DFRWS Forensic Challenge 2009 (Casey and Linux logs 2009

PT
Richard III, 2009)
DFRWS Forensic Challenge 2018 (James, Linux logs 2018
2018)
Computer Forensic Reference Data Sets The Hacking Case (CFReDS, 2007) Windows logs 2007
(CFReDS)

RI
Data Leakage Case (CFReDS, 2015) Windows logs 2015
The Honeynet Project The Forensic Challenge 2001 (Dittrich, 2001) Linux logs 2001
Scan 34 2005 (Marty et al., 2010) Linux and 2005
application logs

SC
Challenge 5 of the Forensic Challenge 2010 Linux logs 2010
(Marty et al., 2010)
SecRepo (Sconzo, 2018) List of datasets Various event logs -

U
logs in wtmp file and command logs in a bash his- The second scenario, the Data Leakage Case
tory file. (CFReDS, 2015), focuses on the leakage of secret
AN
DFRWS challenge 2017-2018 case study is about 1645 proprietary technology from an international com-
a murder of a woman (James, 2018). The mur- pany. This case is more complicated than the first
1615 der was reported by victims husband. They lived one. Most event logs included in this scenario are
in an apartment. The investigator search through Windows event logs. From these log files, the inves-
M

an apartment and collected various IoT digital ev- tigator can gather the login and logout information,
idence such as Raspberry Pi connected via HDMI 1650 application logs, and reconstruct the event timeline.
to TV, Amazon Echo device, and Google OnHub
wifi router. The log files containing useful informa- 11.4. The Honeynet Project
D

1620

tion for forensic analysis is located in Raspberry Pi


The Honeynet Project is a non-profit organiza-
with OSMC (Open Source Media Center) operating
tion that focuses on internet security especially hon-
TE

system installed. OSMC is a Debian-based operat-


eypot technology and digital forensics. This project
ing system, so the structure of logs are similar with
1655 offers a series of forensic challenges. The challenges
1625 other Linux distributions.
most related to event log forensics are Challenge 5
of the Forensic Challenge 2010 (Marty et al., 2010),
EP

11.3. Computer Forensic Reference Data Sets Scan 34 2005 (Chuvakin, 2005), and the Foren-
(CFReDS) Project sic Challenge 2001 (Dittrich, 2001). Solution files
NIST created a dataset to support digital foren- 1660 for each challenge are available from the Honeynet
sic research. There are various scenario that can be Project’s website.
C

1630 investigated by the research community. However, The Forensic Challenge 2010 (Marty et al., 2010)
there are only two scenarios related to event log concerns a compromised Linux system. The direc-
AC

forensics. These are the Hacking Case (CFReDS, tory /var/log/ has been imaged and the investiga-
2007) and Data Leakage Case (CFReDS, 2015). 1665 tor needs to analyze the brute-force attack recorded
There are solution manuals for these two case stud- in the event logs, especially authentication logs.
1635 ies so we can learn from the given scenario. The There is also an Apache access log that is related
Hacking Case (CFReDS, 2007) is about the inves- to the attack. The challenge also requires that the
tigation of an abandoned notebook that has been researchers construct a timeline of the incidents.
used by a hacker. The investigator needs to check 1670 Scan 34 2005 (Chuvakin, 2005) provides various
Windows event logs to determine the last user to log files from a honeypot system such as Apache
1640 log in into the computer. Furthermore, there are logs, Linux syslogs, Snort NIDS logs, and iptables
many chat logs in the forensic image that reveal firewall logs. There is a compromised system and
information about the attacker’s contacts. needs to be analyzed based on those logs. The
23
ACCEPTED MANUSCRIPT

1675 investigator has to describe how the attacked was (2010). Cryptography provides the security, cen-
launched. This case also demands more examina- tralization offers backup and easy management, and
tion of time synchronization between log files. a hardware-based approach will support software-
The last challenge from the Honeynet Project is 1725 based log security.
an old case study from 2001 (Dittrich, 2001), but
1680 is nonetheless relevant to event log forensics. It 12.2. Acquisition of OS logs
involves analyzing RedHat-based Linux syslog and
The acquisition of data from recent hard disk

PT
bash history command logs. The researcher investi-
technologies such as solid state drives (SSD) is
gates the intrusion in a honeypot server and collects
needed since existing techniques only deal with the
information to identify the intruder. This challenge
1730 traditional magnetic platter disk. SSD controller
1685 also requires that a timeline should be created to

RI
hides the disk operations such as compression and
describe the time of attack events.
garbage collection to the host OS. Therefore, the
data is inaccessible and only can be accessed via
11.5. SecRepo
the memory cells acquisition (Bonetti et al., 2013).

SC
SecRepo is a website that provides a list of the 1735 Bonetti et al. (2013) provides a methodology to test
various security-related datasets including event forensic characteristics of SSD. This leads a chal-
1690 logs (Sconzo, 2018). The list is categorized based lenge for recovery of the data stored in SSD. In ad-
on the platform such as network, system, and mal- dition, a more accurate method is needed to make

U
ware. There are a huge number of event logs namely sure that event logs can be properly recovered. Re-
authentication logs from OS, honeypot logs, web 1740 covery tools usually only support a particular plat-
AN
server logs, FTP logs, and DNS logs. All data can form. Therefore, it would be valuable to have a
1695 be downloaded for free with a specific license such generic tool that can run in any environment.
as Creative Commons or Apache License. Other Another issue is about log rotation. Log rotation
datasets such as Digital Corpora and The Honeynet is “closing a log file and opening a new log file when
M

Project are included in SecRepo. 1745 the first file is considered to be complete” (Kent and
This website does not provide any scenario or Souppaya, 2006). When an incident occurred, the
1700 case study. Therefore, SecRepo does not give any investigator should first stop the rotation of log file
solution materials. Unlike other datasets, SecRepo to avoid the loss of potential evidence. If the log
D

only provides various event logs and links to other files are already missing, then a recovery process is
datasets related to computer security. However, 1750 needed.
TE

SecRepo is very useful to get information about


1705 dataset lists and the researcher can go further to 12.3. Main analysis of OS log investigation
the official website of the datasets.
There are automatic methods for event log re-
trieval and event correlation or reconstruction to
EP

12. Open issues and future directions assist the forensic investigator. However, there are
1755 a few methods for automatic anomaly detection in
Finally, we present the major open issues for each the event logs. Along these lines, the present au-
phase in the digital forensic investigation of event thors have proposed a method to detect an anomaly
C

1710 logs. We also propose the future directions of this in the authentication logs from OS without any user
research area. parameters (Studiawan et al., 2017).
AC

1760 As event logs can become very large, big data


12.1. Pre-processing step as forensic readiness of forensics has also become a major challenge in re-
OS logs cent years (Adedayo, 2016). In addition, cloud com-
The forensic readiness of an event log is criti- puting creates a further challenge in certain aspects
1715 cal to support investigation when an incident oc- such as ensuring the chain of custody, cloud log
curs. There is currently a trend towards network 1765 event correlation, and real-time cloud log visualiza-
architectures supported by virtual machines (Sato tion (Khan et al., 2016).
and Yamauchi, 2012). In future, we suggest the The use of machine learning, especially the
combination of cryptography, centralization, and emerging field of deep learning techniques, is also
1720 hardware-supported architectures can improve the a promising field in event log forensics. There are
security of event logs as described in Boeck et al. 1770 only two papers describing the use of deep learning
24
ACCEPTED MANUSCRIPT

for event log abstraction to support a forensic exam- 12.5. Post-process of OS log investigation
ination (Thaler et al., 2017a,b). Moreover, Du et al. There are many frameworks for digital forensic
(2017) proposed a deep learning technique to repre- investigation. The investigators can select models
sent event logs as a natural language sequence and 1825 based on the needs of each forensic case. The exist-
1775 named the method DeepLog. The neural network ing tools should accommodate the various models
model used is Long Short-Term Memory (LSTM). of investigations. At the moment, common foren-
DeepLog trains the model from log patterns of nor- sic tools only support a generic model. These fea-

PT
mal activities, and then identify anomalies when log tures should be extended to accommodate a specific
patterns different from the model trained in an au- 1830 framework based on the needs of the case, poten-
1780 tomatic fashion. Although DeepLog is not intended tially employing a plugin-based architecture.
for forensic analysis explicitly, it can operate in real

RI
Another issue in post-process is anti-forensics.
time so that it can handle new log patterns over
As discussed in Garfinkel (2007), Windows log file
time. Accordingly, this is likely to be a promising
will execute regular expressions inserted in log en-
area for future work in event log forensics.
1835 tries. This will make the operating system hang

SC
1785 Furthermore, the event log forensics is also crit-
and cannot record log entries. This attack was
ical to the security of Internet of Things (IoT) de-
demonstrated by Foster and Liu (2005) in Black-
vices (Watson and Dehghantanha, 2016). The ex-
hat Briefings 2005. To overcome this anti-forensics
isting frameworks, methods, and tools for forensics
technique the logging system needs to filter the log

U
are not designed for IoT as this is a new area in
1840 entries, so the logging system only run the legiti-
1790 information technology. Therefore, we may need
mate entries. In addition, one of various event log
existing approaches to be adapted to IoT forensics,
AN
security methods discussed in Section 5.1 should be
especially for event logs. Similar to the acquisition
implemented, with an assumption that the attacker
and preservation phases, most analysis techniques
does not have any physical access to the computer.
only support a particular platform. These methods
1845 The common nature to attack log files is to delete
should be designed to handle various environments
M

1795
them. Brett et al. (2017) have demonstrated how
or can be extended to suit other platforms. Gen-
to delete Windows 7 event logs with minimal suspi-
erally, the existing methods are implemented on a
ciousness to the investigators. The technique first
particular system such as Windows 7. However,
stops logging services and there is a window time for
D

these techniques could be recreated, redesign, and


1850 60 seconds before Windows restarts the services. In
1800 reimplemented for other operating systems.
this window time, the attacker can remove the log
TE

file at a binary level for several minutes. Brett et al.


12.4. Visualization of OS logs
(2017) also reported that this technique generates
Presentation of event logs contains two main as-
some errors to the system logs. Therefore, if the gap
pects, specifically visualization and preparation of
1855 between log entries and some of the error logs are
an investigation report. Event log visualization is
EP

suspicious, then the investigators need to pay more


1805 usually not intended for forensic purposes only, but
attention to this issue. As the anti-forensics tech-
is focused on security incident analysis. The dig-
nique continue developing, we also need to handle
ital forensic community should check the relevant
to create more sophisticated forensic analysis meth-
research areas in order to improve the features and
C

1860 ods.
advance event log visualization techniques. For ex-
1810 ample, the Zeitline tool (Buchholz and Falk, 2005)
AC

is not maintained anymore by its developers, but 12.6. Tools for OS log forensics
this work is one of the most highly cited papers in There are a range of tools for event log parsing of
the field. Windows logs. On the other hand, there are only a
The only visualization tool for event log forensics few libraries to parse or analyze event logs on Unix-
1815 that has active developers is log2timeline (Metz, 1865 based operating systems. However, these libraries
2018c). Additionally, a new tool named Timesketch are not integrated into a single library compared
(Google, 2018) which enables collaborative time- to the Windows log parser (Metz, 2018a,b). The
line analysis is actively maintained. Therefore, we parsing techniques mostly use regular expression or
recommend forensic community to keep supporting Grok rules. One difficulty of event log parsing is
1820 these open source visualization tools to make event 1870 that the configurations of event logs such as syslog
log examination easy and accurate. vary from one server to another and the system
25
ACCEPTED MANUSCRIPT

administrator usually configures logging based on 1920 needed to advance the state of the art techniques
their needs. However, there is a common pattern of for OS log investigation.
these Unix-based event logs. As time goes by, attackers use more sophisticated
1875 Therefore, a generic library, especially for Unix- and complicated techniques to evade the forensic
based OS, should be developed. Recent work by methods and tools. Therefore, more sophisticated
Studiawan et al. (2018) propose nerlogparser tool 1925 methods are needed to identify and analyze such
providing generic model to parse various semi- attacks on computer systems.

PT
structured log files such as OS logs. nerlogparser
1880 parses log files based on named entity recognition Acknowledgments
(NER). NER is a mechanism to recognize named
entities from a text data. In event logs, nerlogparser This work is supported by the Indonesia Lec-

RI
defines named entities as words or phrases contain- turer Scholarship (BUDI) from Indonesia Endow-
ing common fields in a log entry such as timestamp, 1930 ment Fund for Education (LPDP), Ministry of Fi-
1885 host name, or service name. Recognizing named en- nance, Republic of Indonesia.

SC
tities is equivalent to determining each field in a log
entry. The nerlogparser uses a deep learning tech-
References
nique namely bidirectional long short-term memory
networks to perform NER. However, nerlogparser Abbott, J., Bell, J., Clark, A., De Vel, O., Mohay, G., 2006.
Automated recognition of event scenarios for digital foren-

U
1890 still has a possible drawback when there is an input
1935 sics, in: Proceedings of the ACM Symposium on Applied
from a particular log file that has a completely dif- Computing, pp. 293–300.
ferent log entry structure and nerlogparser cannot
AN
Abraham, T., De Vel, O., 2002. Investigative profiling with
recognize the log format. computer forensic log data and association rules, in: Pro-
ceedings of the IEEE International Conference on Data
1940 Mining, pp. 11–18.
12.7. Public datasets for OS log forensics Abraham, T., Kling, R., De Vel, O., 2002. Investigative
profile analysis with computer forensic log data using at-
M

1895 Most public datasets for event log forensics are tribute generalisation, in: Proceedings of the Australasian
at least three years old. The research community Data Mining Workshop, pp. 17–27.
needs an up to date dataset to accommodate the 1945 Abrial, J.R., 1996. The B-book: Assigning programs to
recent attack models (Grajeda et al., 2017). For meanings. Cambridge University Press.
D

AccessData, 2018. Forensic Toolkit (FTK).


instance, there is a tool namely EviPlant to cre- https://accessdata.com/products-services/
1900 ate digital forensics case study or images (Scanlon forensic-toolkit-ftk.
TE

et al., 2017). Researchers can use this tool and 1950 Accorsi, R., 2009a. Log data as digital evidence: What se-
share the generated data with the digital forensics cure logging protocols have to offer?, in: Proceedings of
the 33rd Annual IEEE International Computer Software
community. and Applications Conference, pp. 398–403.
Accorsi, R., 2009b. Safekeeping digital evidence with se-
EP

1955 cure logging protocols: State of the art and challenges,


13. Conclusion in: Proceedings of the 5th International Conference on
IT Security Incident Management and IT Forensics, pp.
1905 In this paper, we present a comprehensive sur- 94–110.
Accorsi, R., 2011. BBox: A distributed secure log archi-
vey of operating system (OS) log forensics research.
C

1960 tecture, in: Proceedings of the 7th European Workshop


This work is structured based on each phase of a on Public Key Infrastructures, Services and Applications,
digital forensic investigation framework. For each pp. 109–124.
AC

phase, we describe the existing methods in the lit- Adedayo, O.M., 2016. Big data and digital forensics, in:
Proceedings of the IEEE International Conference on Cy-
1910 erature and identify both its advantages and dis- 1965 bercrime and Computer Forensic, pp. 1–7.
advantages. We also provide a list of tools for con- Agarwal, R., Kothari, S., 2015. Review of digital forensic in-
ducting OS log examination. Furthermore, publicly vestigation frameworks, in: Information Science and Ap-
available datasets are described in detail. plications, pp. 561–571.
Ahmad, A., Ruighaver, A., 2003. Improved event log-
There are several open issues in the context of OS 1970 ging for security and forensics: Developing audit man-
1915 log forensics research. One of the main issues is to agement infrastructure requirements, in: Proceedings of
encourage the research community to use standard the ISOneWorld.
Ahmad, A., Ruighaver, A., 2004. Towards identifying cri-
datasets so the performance of the proposed meth-
teria for the evidential weight of system event logs, in:
ods can be compared and evaluated against each 1975 Proceedings of the Australian Computer, Network and
other. The use of open source tools also urgently Information Forensics Conference, pp. 40–47.

26
ACCEPTED MANUSCRIPT

Alink, W., Bhoedjang, R.A.F., Boncz, P.A., de Vries, A.P., Proceedings of the 29th Annual Computer Security Ap-
2006. XIRAF - XML-based indexing and querying for plications Conference, New Orleans, Louisiana, USA. pp.
digital forensics. Digital Investigation 3, Supplem, S50– 269–278.
1980 S58. 2045 Borhan, N., Mahmod, R., Dehghantanha, A., 2012. A frame-
Almulla, S.A., Iraqi, Y., Jones, A., 2014. A state-of-the-art work of TPM, SVM and boot control for securing forensic
review of cloud forensics. Journal of Digital Forensics, logs. International Journal of Computer Applications 50,
Security and Law 9, 7–28. 15–19.
Amato, F., Cozzolino, G., Mazzeo, A., Mazzocca, N., 2017. Boyd, C., Forster, P., 2004. Time and date issues in forensic
Correlation of digital evidences in forensic investigation 2050 computing—a case study. Digital Investigation 1, 18–23.

PT
1985
through semantic technologies, in: Proceedings of the 31st Brett, E.S., Raymond, C.K.K., Sameera, M., Helen, A.,
IEEE International Conference on Advanced Information 2017. Windows 7 antiforensics: A review and a novel
Networking and Applications Workshops, pp. 668–673. approach. Journal of Forensic Sciences 62, 1054–1070.
Arasteh, A.R., Debbabi, M., Sakha, A., Saleh, M., 2007. Buchholz, F., Falk, C., 2005. Design and implementation of

RI
1990 Analyzing multiple logs for forensic evidence. Digital In- 2055 Zeitline: A forensic timeline editor, in: Proceedings of the
vestigation 4, Supplem, 82–91. Digital Forensic Research Conference, pp. 1–7.
Årnes, A., Haas, P., Vigna, G., Kemmerer, R.A., 2006. Digi- Carrier, B.D., Spafford, E.H., 2004. Defining event recon-
tal forensic reconstruction and the virtual security testbed struction of digital crime scenes. J. Forensic Sci. 49,
ViSe, in: Proceedings of the Detection of Intrusions and JFS2004127–8.

SC
1995 Malware and Vulnerability Assessment, pp. 144–163. 2060 Carrier, B.D., Spafford, E.H., 2006. Categories of digital
Årnes, A., Haas, P., Vigna, G., Kemmerer, R.A., 2007. Using investigation analysis techniques based on the computer
a virtual security testbed for digital forensic reconstruc- history model. Digital Investigation 3, Supplem, S121–
tion. Journal in Computer Virology 2, 275–289. S130.
Assuria, 2018. Assuria ALM-SIEM: SIEM, FIM and Enter- Casey, E., 2011. Digital evidence and computer crime. 3rd.

U
2000 prise Log Management. https://assuria.com/products/ 2065 ed., Academic Press.
alm-siem/. Casey, E., Richard III, G.G., 2009. DFRWS Forensic Chal-
Atkinson, J., 2018. PowerForensics - PowerShell lenge 2009. http://old.dfrws.org/2009/challenge/
AN
Digital Forensics. https://github.com/Invoke-IR/ index.shtml.
PowerForensics. CFReDS, 2007. Hacking Case from Computer Forensic Ref-
2005 Awawdeh, S.A., Baggili, I., Marrington, A., Iqbal, F., 2013. 2070 erence Data Sets (CFReDS). https://www.cfreds.nist.
Towards a unified agent-based approach for real time com- gov/Hacking_Case.html.
puter forensic evidence collection, in: Proceedings of the CFReDS, 2015. Data Leakage Case from Computer Forensic
M

8th International Workshop on Systematic Approaches to Reference Data Sets (CFReDS). https://www.cfreds.
Digital Forensics Engineering, pp. 1–8. nist.gov/data_leakage_case/data-leakage-case.html.
2010 Ayrapetov, D., Ganapathi, A., Leung, L., 2002. Improving 2075 Chabot, Y., Bertaux, A., Nicolle, C., Kechadi, T., 2015.
the protection of logging systems. Technical Report. UC Event reconstruction: A state of the art, in: Handbook
D

Berkeley Computer Science, Berkeley, CA USA. of Research on Digital Crime, Cyberspace Security, and
Ballenthin, W., 2018a. EVTXtract: A tool for recovering Information Assurance, pp. 231–245.
and reconstructing fragments of EVTX log files. https: Chen, K., Clark, A., De Vel, O., Mohay, G., 2003. ECF -
TE

2015 //github.com/williballenthin/EVTXtract. 2080 Event correlation for forensics, in: Proceedings of the 1st
Ballenthin, W., 2018b. python-evtx: Pure Python parser for Australian Computer Network and Information Forensics
recent Windows event log files. URL: https://github. Conference, pp. 1–10.
com/williballenthin/python-evtx. Cho, G.S., 2013. A computer forensic method for detecting
Barakat, A., Hadi, A., 2016. Windows forensic investiga- timestamp forgery in NTFS. Computers & Security 34,
tions using PowerForensics tool, in: Proceedings of the 2085 36–46.
EP

2020
Cybersecurity and Cyberforensics Conference, pp. 41–47. Choi, J., Savoldi, A., Gubian, P., Lee, S., Lee, S., 2008.
Basis Technology, 2018. Autopsy. https://www.autopsy. Live forensic analysis of a compromised linux system us-
com/. ing LECT (Linux Evidence Collection Tool), in: Proceed-
Bayer, U., Habibi, I., Balzarotti, D., Kirda, E., Kruegel, ings of the 2nd International Conference on Information
C., 2009. A view on current malware behaviors, in: Pro- 2090 Security and Assurance, pp. 231–236.
C

2025
ceedings of the 2nd USENIX Conference on Large-scale Chopitea, T., 2014. event2timeline: Simple Microsoft Win-
Exploits and Emergent Threats, pp. 1–8. dows sessions event logs visualization. https://github.
AC

Beebe, N.L., Clark, J.G., 2005. A hierarchical, objectives- com/certsocietegenerale/event2timeline.


based framework for the digital investigations process. Chou, B.H., Tatara, K., Sakuraba, T., Hori, Y., Sakurai,
2030 Digital Investigation 2, 147–167. 2095 K., 2008. A secure virtualized logging scheme for digi-
Berggren, J., 2017. Thinking in graphs: Exploring tal forensics in comparison with kernel module approach,
with Timesketch. https://medium.com/timesketch/ in: Proceedings of the 2nd International Conference on
thinking-in-graphs-exploring-with-timesketch-84b79aecd8a6.Information Security and Assurance, pp. 421–426.
Boeck, B., Huemer, D., Tjoa, A.M., 2010. Towards Chuvakin, A., 2005. Scan 34 2005 from The Honeynet
2035 more trustable log files for digital forensics by means of 2100 Project. http://old.honeynet.org/scans/scan34/.
”Trusted Computing”, in: Proceedings of the 24th IEEE Cleaveland, R., 1990. Tableau-based model checking in the
International Conference on Advanced Information Net- propositional mu-calculus. Acta Informatica 27, 725–747.
working and Applications, pp. 1020–1027. Collett, D., Cohen, M., 2008. PyFlag: Forensic and log anal-
Bonetti, G., Viglione, M., Frossi, A., Maggi, F., Zanero, S., ysis GUI. https://sourceforge.net/projects/pyflag/.
2040 2013. A comprehensive black-box methodology for test- 2105 Corney, M., Mohay, G., Clark, A., 2011. Detection of anoma-
ing the forensic characteristics of solid-state drives, in: lies from user profiles generated from system logs, in:

27
ACCEPTED MANUSCRIPT

Proceedings of the 9th Australasian Information Security Forensic Challenge 2008. http://old.dfrws.org/2008/
Conference, pp. 23–32. challenge/index.shtml.
Craiger, P., 2005. Recovering digital evidence from Linux GFI Software, 2018. GFI EventsManager.
2110 systems, in: Proceedings of the IFIP International Con- 2175 https://www.gfi.com/products-and-solutions/
ference on Digital Forensics, pp. 233–244. network-security-solutions/gfi-eventsmanager.
Crosby, S., Wallach, D., 2009. Efficient data structures Gladyshev, P., 2006. EARL (Event Analysis and Re-
for tamper-evident logging, in: Proceedings of the 19th construction in Lisp). http://formalforensics.org/
USENIX Security Symposium, pp. 1–17. smforensics/earl/index.html.
Dittrich, D., 2001. The Honeynet Project’s Forensic Gladyshev, P., Patel, A., 2004. Finite state machine ap-

PT
2115 2180
Challenge 2001. http://old.honeynet.org/challenge/ proach to digital event reconstruction. Digital Investiga-
index.html. tion 1, 130 –149.
Do, Q., Martini, B., Looi, J., Wang, Y., Choo, K.k., 2014. Gómez, R., Herrerias, J., Mata, E., 2005. Using Lamport’s
Windows event forensic process, in: Advances in Digital logical clocks to consolidate log files from different sources,

RI
2120 Forensics X, pp. 87–100. 2185 in: Proceedings of the International Workshop on Inno-
Du, M., Li, F., Zheng, G., Srikumar, V., 2017. DeepLog: vative Internet Community Systems, pp. 126–133.
Anomaly detection and diagnosis from system logs Goodrich, M.T., Atallah, M.J., Tamassia, R., 2005. Indexing
through deep learning, in: Proceedings of the 2017 ACM information for data forensics, in: Proceedings of the 3rd
SIGSAC Conference on Computer and Communications International Conference on Applied Cryptography and

SC
2125 Security, ACM, Dallas, Texas, USA. pp. 1285–1298. 2190 Network Security, pp. 206–221.
Elasticsearch, 2018. Elasticsearch, Logstash, and Kibana Google, 2018. Timesketch: Collaborative forensic timeline
(ELK): The open source elastic stack. https://www. analysis. https://github.com/google/timesketch.
elastic.co/products. Grajeda, C., Breitinger, F., Baggili, I., 2017. Availability
Elyas, M., Ahmad, A., Maynard, S.B., Lonie, A., 2015. Digi- of datasets for digital forensics – And what is missing.

U
2130 tal forensic readiness: Expert perspectives on a theoretical 2195 Digital Investigation 22, S94–S105.
framework. Computers & Security 52, 70 – 89. Hargreaves, C., Patterson, J., 2012. An automated timeline
Etoh, F., Takahashi, K., Hori, Y., Sakurai, K., 2010. Study reconstruction approach for digital forensic investigations.
AN
of log file dispersion management method, in: Proceed- Digital Investigation 9, Supplem, S69–S79.
ings of the 10th IEEE International Symposium on Ap- Herrerı́as, J., Gomez, R., 2007. A log correlation model to
2135 plications and the Internet, pp. 371–374. 2200 support the evidence search process in a forensic investiga-
European Commission, 2010. Standard on logging and mon- tion, in: Proceedings of the 2nd International Workshop
itoring. Technical Report. on Systematic Approaches to Digital Forensic Engineer-
M

Falk, C., Buchholz, F., 2006. Zeitline. https:// ing, pp. 31–39.
sourceforge.net/projects/zeitline/. Herrerı́as, J., Gómez, R., 2010. Log analysis towards an au-
2140 Farina, J., Scanlon, M., Le-Khac, N.A., Kechadi, M.T., 2015. 2205 tomated forensic diagnosis system, in: Proceedings of the
Overview of the forensic investigation of cloud services, 5th International Conference on Availability, Reliability,
D

in: Proceedings of the 10th International Conference on and Security, pp. 659–664.
Availability, Reliability and Security, pp. 556–565. Hu, Q., Tang, B., Lin, D., 2017. Anomalous user activity de-
Forte, D.V., 2004. The ”art” of log correlation: Tools and tection in enterprise multi-source logs, in: Proceedings of
TE

2145 techniques for correlating events and log files. Computer 2210 the 2017 IEEE International Conference on Data Mining
Fraud and Security 2004, 15–17. Workshops, pp. 797–803.
Foss, G., 2017. PSRecon: PowerShell incident response - live Huang, X., Wu, S., 2009. Vista event log file parsing based
forensic data acquisition. https://github.com/gfoss/ on XML technology, in: Proceedings of the 4th Interna-
PSRecon. tional Conference on Computer Science and Education,
Foster, J.C., Liu, V., 2005. Catch me, if you can ... Technical pp. 1186–1190.
EP

2150 2215
Report. Blackhat Briefings. Hull, D., 2018. Kansa: A Powershell incident response frame-
FSPro Labs, 2018. Event Log Explorer. https:// work. https://github.com/davehull/Kansa.
eventlogxp.com/. Ibrahim, N.M., Al-Nemrat, A., Jahankhani, H., Bashroush,
Garfinkel, S., 2007. Anti-forensics: Techniques, detection R., 2012. Sufficiency of Windows event log as evidence in
and countermeasures, in: Proceedings of the 2nd Interna- digital forensics, in: Proceedings of the Global Security,
C

2155 2220
tional Conference on i-Warfare and Security, pp. 77–84. Safety and Sustainability and e-Democracy, pp. 253–262.
Garfinkel, S., 2008. M57-Jean scenario. http:// Inglot, B., Liu, L., 2014. Enhanced timeline analysis for dig-
AC

digitalcorpora.org/corpora/scenarios/m57-jean. ital forensic investigations. Information Security Journal:


Garfinkel, S., 2009a. M57-Patents scenario. A Global Perspective 23, 32–44.
2160 http://digitalcorpora.org/corpora/scenarios/ 2225 Inglot, B., Liu, L., Antonopoulos, N., 2012. A framework for
m57-patents-scenario. enhanced timeline analysis in digital forensics, in: Pro-
Garfinkel, S., 2009b. nps-2009-casper-rw: An ceedings of the IEEE International Conference on Green
ext3 file system from a bootable USB. http: Computing and Communications, pp. 253–256.
//downloads.digitalcorpora.org/corpora/drives/ James, J., 2018. DFRWS Forensic Challenge 2017-2018.
2165 nps-2009-casper-rw/. 2230 https://jijames.github.io/DFRWS2018Challenge/.
Garfinkel, S., 2018. Digital Corpora: Producing the digital James, J., Gladyshev, P., Abdullah, M.T., Zhu, Y., 2009.
body. http://digitalcorpora.org/. Analysis of evidence using formal event reconstruction,
Garfinkel, S., Farrell, P., Roussev, V., Dinolt, G., 2009. in: Proceedings of the 1st International Conference on
Bringing science to digital forensics with standardized Digital Forensics and Cyber Crime, pp. 85–98.
2170 forensic corpora. Digital Investigation 6, S2–S11. 2235 Kazanciyan, R., 2013. AuditParser. https://github.com/
Geiger, M., Venema, W., Casey, E., 2008. DFRWS mandiant/AuditParser.

28
ACCEPTED MANUSCRIPT

Kent, K., Souppaya, M., 2006. Guide to computer security Metz, J., 2018b. libevtx: A library to access the Windows
log management. Technical Report. National Institute of XML event log (evtx) format.
Standards and Technology. Metz, J., 2018c. log2timeline: Super timeline all the things.
2240 Khan, S., Gani, A., Wahab, A.W.A., Bagiwa, M.A., Shiraz, 2305 https://github.com/log2timeline/plaso.
M., Khan, S.U., Buyya, R., Zomaya, A.Y., 2016. Cloud Microsoft, 2005. LogParser: A powerful, versatile tool
log forensics: Foundations, state of the art, and future that provides universal query access to text-based
directions. ACM Computing Surveys 49, 7:1–7:42. data. https://www.microsoft.com/en-us/download/
Koen, R., Olivier, M., 2008. The use of file timestamps in details.aspx?id=24659.
digital forensics, in: Proceedings of the Innovative Minds Mishra, A.K., Matta, P., Pilli, E.S., Joshi, R.C., 2012.

PT
2245 2310
Conference. Cloud forensics: State-of-the-art and research challenges,
Lazzez, A., Slimani, T., 2015. Forensics investigation of web in: Proceedings of the International Symposium on Cloud
application security attacks. J. Computer Network and and Services Computing, pp. 164–170.
Information Security 3, 10–17. Monteiro, S.D.S., Erbacher, R.F., 2008. An authentication

RI
2250 Lee, S., Savoldi, A., Lim, K.S., Park, J.H., Lee, S., 2010. 2315 and validation mechanism for analyzing syslogs forensi-
A proposal for automating investigations in live forensics. cally. ACM SIGOPS Operating Systems Review 42, 41.
Computer Standards and Interfaces 32, 246–255. Murphey, R., 2007a. Automated Windows event log foren-
Levy, J., 2013. evtlogs Volatility plugin. https: sics. Digital Investigation 4, Supplem, S92–S100.
//github.com/volatilityfoundation/volatility/ Murphey, R., 2007b. FixEvt: A tool for automating the

SC
2255 blob/master/volatility/plugins/evtlogs.py. 2320 recovery and analysis of Windows NT5 event logs. http:
Liao, Y.C., Langweg, H., 2014. Resource-based event recon- //murphey.org/fixevt.html.
struction of digital crime scenes, in: Proceedings of the Olajide, F., Savage, N., Ndzi, D., Al-Sinani, H., 2009. Foren-
IEEE Joint Intelligence and Security Informatics Confer- sic live response and event reconstruction methods in
ence, pp. 129–136. Linux systems, in: Proceedings of the Convergence of

U
2260 Lin, C., Li, Z., Gao, C., 2009. Automated analysis of multi- 2325 Telecommunications, Networking and Broadcasting, pp.
source logs for network forensics, in: Proceedings of the 141–146.
1st International Workshop on Education Technology and Olsson, J., 2012. CyberForensics TimeLab. https://
AN
Computer Science, pp. 660–664. github.com/jensolsson/CFTL/.
Lou, Y., Wang, P., Xu, M., Zheng, N., 2009. Automated Olsson, J., Boldt, M., 2009. Computer forensic timeline visu-
2265 event log file recovery based on content characters and 2330 alization tool. Digital Investigation 6, Supplem, S78–S87.
internal structure, in: Proceedings of the 1st International One Identity, 2018. syslog-ng. https://www.syslog-ng.com/
Conference on Information Science and Engineering, pp. products/open-source-log-management/.
M

4778–4781. OpenText, 2018. EnCase Forensic. https://www.


Makanju, A., Zincir-Heywood, A.N., Milios, E.E., 2012. A guidancesoftware.com/encase-forensic.
2270 lightweight algorithm for message type extraction in sys- 2335 Patterson, J., Hargreaves, C., 2012. The potential for cross-
tem application logs. IEEE Transactions on Knowledge drive analysis using automated digital forensic timelines,
D

and Data Engineering 24, 1921–1936. in: Proceedings of the 6th International Conference on
ManageEngine, 2018. EventLog Analyzer. https://www. Cybercrime Forensics Education and Training.
manageengine.com/products/eventlog/. Pilli, E.S., Joshi, R.C., Niyogi, R., 2010. Network foren-
TE

2275 Marrington, A., Baggili, I., Mohay, G., Clark, A., 2011. CAT 2340 sic frameworks: Survey and research challenges. Digital
detect (Computer Activity Timeline detection): A tool Investigation 7, 14 – 27.
for detecting inconsistency in computer activity timelines. Ra, I., Park, T.K., 2009. A forensic logging system based on
Digital Investigation 8, Supplem, S52–S61. a secure OS. International Journal of Computer Science
Marrington, A., Mohay, G., Clark, A., Morarji, H., 2007. and Applications 6, 75–91.
Event-based computer profiling for the forensic recon- Richard III, G.G., Roussev, V., 2005. Scalpel: A frugal,
EP

2280 2345
struction of computer activity, in: Proceedings of the high performance file carver, in: Proceedings of the 2005
AusCERT Asia Pacific Information Technology Security Digital Forensics Research Workshop, pp. 1–10.
Conference, pp. 71–87. Sahoo, P.K., Chottray, R.K., Pattnaiak, S., 2012. Research
Marrington, A., Mohay, G., Clark, A., Morarji, H., 2009. issues on Windows event log. International Journal of
Dealing with temporal inconsistency in automated com- Computer Applications 41, 40–48.
C

2285 2350
puter forensic profiling. Technical Report. Queensland Saleh, M., Arasteh, A.R., Sakha, A., Debbabi, M., 2007.
University of Technology. Forensic analysis of logs: Modeling and verification.
AC

Marty, R., Chuvakin, A., Tricaud, S., 2010. Challenge Knowledge-Based Systems 20, 671–682.
5 of the Honeynet Project Forensic Challenge 2010 - Sato, M., Yamauchi, T., 2012. VMM-based log-tampering
2290 Log Mysteries. http://honeynet.org/challenges/2010_ 2355 and loss detection scheme. Journal of Internet Technology
5_log_mysteries. 13, 655–666.
Mazza, E., Potet, M.L., Métayer, D.L., 2010. A formal Scanlon, M., Du, X., Lillis, D., 2017. EviPlant: An efficient
framework for specifying and analyzing liabilities using log digital forensic challenge creation, manipulation and dis-
as digital evidence, in: Proceedings of the 13th Brazilian tribution solution. Digital Investigation 20, S29–S36.
2295 Symposium on Formal Methods, pp. 194–209. 2360 Schatz, B., Mohay, G., Clark, A., 2004. Rich event represen-
Métayer, D.L., Mazza, E., Potet, M.L., 2010. Designing log tation for computer forensics, in: 5th Asia Pacific Indus-
architectures for legal evidence, in: Proceedings of the 8th trial Engineering and Management Systems Conference,
IEEE International Conference on Software Engineering pp. 2.12.1–2.12.16.
and Formal Methods, pp. 156–165. Schatz, B., Mohay, G., Clark, A., 2006. A correlation method
2300 Metz, J., 2018a. libevt: A library to access the Windows 2365 for establishing provenance of timestamps in digital evi-
event log (evt) format. dence. Digital Investigation 3, Supplem, S98–S107.

29
ACCEPTED MANUSCRIPT

Schindler, T., 2017. Anomaly detection in log data using //tntsoftware.com/solutions/event-log-management/.


graph databases and machine learning to defend advanced TZWorks, 2018a. TZWorks event log message tables offline
persistent threats, in: Lecturer Notes in Informatics, pp. (elmo). https://www.tzworks.net/prototype_page.php?
2370 2371–2378. 2435 proto_id=35.
Schmerl, S., Vogel, M., Rietz, R., König, H., 2010. Explo- TZWorks, 2018b. TZWorks event log parser (evt-
rative visualization of log data to support forensic anal- walk). https://www.tzworks.net/prototype_page.php?
ysis and signature development, in: Proceedings of the proto_id=25.
5th International Workshop on Systematic Approaches to TZWorks, 2018c. TZWorks Windows event log viewer
Digital Forensic Engineering, pp. 109–118. (evtx view). https://www.tzworks.net/prototype_

PT
2375 2440
Schneier, B., Kelsey, J., 1999. Secure audit logs to support page.php?proto_id=4.
computer forensics. ACM Transactions on Information Vaarandi, R., 2002a. SEC - A lightweight event correlation
and System Security 2, 159–176. tool, in: IEEE Workshop on IP Operations and Manage-
Schuster, A., 2007. Introducing the Microsoft Vista event log ment, pp. 111–115.

RI
2380 file format. Digital Investigation 4, Supplem, S65–S72. 2445 Vaarandi, R., 2002b. SEC (Simple Event Correlator). http:
Schuster, A., 2011. Parse-Evtx: Windows event //simple-evcorr.github.io/.
log parser library and tools collection for Perl. Vaarandi, R., 2003. A data clustering algorithm for min-
URL: http://computer.forensikblog.de/en/2011/11/ ing patterns from event logs, in: Proceedings of the 2003
evtx-parser-1-1-1.html. IEEE Workshop on IP Operations and Management, pp.

SC
2385 Sconzo, M., 2018. SecRepo.com: Security data samples 2450 119–126.
repository. http://www.secrepo.com/. Vaarandi, R., Niziński, P., 2013. Comparative analysis of
Sommer, P., 1997. Downloads, logs and captures: Evidence open-source log management solutions for security moni-
from cyberspace. Journal of Financial Crime 5, 138–151. toring and network forensics, in: Proceedings of the Eu-
Son, N., Lee, S., 2011. Forensic investigation method and ropean Conference on Information Warfare and Security,

U
2390 tool based on the user behaviour, in: Proceedings of the 2455 pp. 278–287.
9th Australian Digital Forensics Conference, pp. 125–133. Van der Aalst, W.M.P., 2013. Business process manage-
Splunk Inc, 2018. Splunk tool. https://www.splunk.com/. ment: A comprehensive survey. ISRN Software Engineer-
AN
Studiawan, H., Payne, C., Sohel, F., 2017. Graph clustering ing 2013.
and anomaly detection of access control log for forensic Wang, W., Daniels, T.E., 2005. Building evidence graphs for
2395 purposes. Digital Investigation 21, 76–87. 2460 network forensics analysis, in: Proceedings of the Annual
Studiawan, H., Sohel, F., Payne, C., 2018. Automatic log Computer Security Applications Conference, pp. 254–264.
parser to support forensic analysis, in: Proceedings of the Wang, W., Daniels, T.E., 2006. Diffusion and graph spectral
M

16th Australian Digital Forensics Conference, pp. 1–10. methods for network forensic analysis, in: Proceedings of
Takahashi, D., Xiao, Y., 2008a. Complexity analysis of re- the Workshop on New Security Paradigms, pp. 99–106.
2400 trieving knowledge from auditing log files for computer 2465 Watson, S., Dehghantanha, A., 2016. Digital forensics: the
and network forensics and accountability, in: Proceed- missing piece of the Internet of Things promise. Computer
D

ings of the IEEE International Conference on Communi- Fraud & Security 2016, 5–8.
cations, pp. 1474–1478. Wettstein, G., Tweedie, S., Virtanen, J., Alderton, S.,
Takahashi, D., Xiao, Y., 2008b. Retrieving knowledge from Schulze, M., 2018. syslogd. https://linux.die.net/man/
TE

2405 auditing log-files for computer and network forensics and 2470 8/syslogd.
accountability. Security and Communication Networks 1, Wurzenberger, M., Skopik, F., Landauer, M., Greitbauer, P.,
147–160. Fiedler, R., Kastner, W., 2017. Incremental clustering for
Talebi, J., Dehghantanha, A., Mahmoud, R., 2015. Introduc- semi-supervised anomaly detection applied on log data,
ing and analysis of the Windows 8 event log for forensic in: Proceedings of the 12th International Conference on
purposes, in: Computational Forensics. Lecture Notes in Availability, Reliability and Security, ACM, Reggio Cal-
EP

2410 2475
Computer Science. volume 8915, pp. 145–162. abria, Italy. pp. 31:1–31:6.
Tan, J., 2001. Forensic readiness. Technical Report. @stake Yen, T.F., Oprea, A., Onarlioglu, K., Leetham, T., Robert-
Inc. son, W., Juels, A., Kirda, E., 2013. Beehive: Large-scale
Tang, M., Fidge, C.J., 2010. Reconstruction of falsified com- log analysis for detecting suspicious activity in enterprise
puter logs for digital forensics investigations, in: Proceed- networks, in: Proceedings of the 29th Annual Computer
C

2415 2480
ings of the 8th Australasian Information Security Confer- Security Applications Conference, ACM, New Orleans,
ence, pp. 12–21. Louisiana, USA. pp. 199–208.
AC

Thaler, S., Menkonvski, V., Petković, M., 2017a. Towards a Yusoff, Y., Ismail, R., Hassan, Z., 2011. Common phases
neural language model for signature extraction from foren- of computer forensics investigation models. International
2420 sic logs, in: Proceedings of the 5th International Sympo- 2485 Journal of Computer Science and Information Technology
sium on Digital Forensic and Security, pp. 1–6. 3, 17–31.
Thaler, S., Menkovski, V., Petković, M., 2017b. Towards Zafar, F., Khan, A., Suhail, S., Ahmed, I., Hameed, K.,
unsupervised signature extraction of forensic logs, in: Khan, H.M., Jabeen, F., Anjum, A., 2017. Trustworthy
Proceedings of the 26th Benelux Conference on Machine data: A survey, taxonomy and future trends of secure
2425 Learning, pp. 154–159. 2490 provenance schemes. Journal of Network and Computer
The Volatility Foundation, 2018. Volatility. http://www. Applications 94, 50 – 68.
volatilityfoundation.org/. Zhu, Y., James, J., Gladyshev, P., 2009. A comparative
Thorpe, S., Ray, I., 2012. Detecting temporal inconsistency methodology for the reconstruction of digital events using
in virtual machine activity timelines. Journal of Informa- Windows restore points. Digital Investigation 6, 8–15.
2430 tion Assurance and Security 7, 24–31.
TNT Software, 2018. ELM Enterprise Manager. https:

30

You might also like