Professional Documents
Culture Documents
3 Journal
3 Journal
3 Journal
9-30-2018
Travis Atkison
University of Alabama, atkison@cs.ua.edu
Recommended Citation
Haque, Shariful and Atkison, Travis (2018) "A Forensic Enabled Data Provenance Model for Public Cloud," Journal of Digital Forensics,
Security and Law: Vol. 13 : No. 3 , Article 7.
DOI: https://doi.org/10.15394/jdfsl.2018.1570
Available at: https://commons.erau.edu/jdfsl/vol13/iss3/7
This Article is brought to you for free and open access by the Journals at
Scholarly Commons. It has been accepted for inclusion in Journal of Digital
Forensics, Security and Law by an authorized administrator of Scholarly
Commons. For more information, please contact commons@erau.edu,
wolfe309@erau.edu. (c)ADFSL
A Forensic Enabled Data Provenance Model for Public Cloud JDFSL V13N3
ABSTRACT
Cloud computing is a newly emerging technology where storage, computation and services
are extensively shared among a large number of users through virtualization and distributed
computing. This technology makes the process of detecting the physical location or own-
ership of a particular piece of data even more complicated. As a result, improvements in
data provenance techniques became necessary. Provenance refers to the record describing
the origin and other historical information about a piece of data. An advanced data prove-
nance system will give forensic investigators a transparent idea about the data’s lineage, and
help to resolve disputes over controversial pieces of data by providing digital evidence. In
this paper, the challenges of cloud architecture are identified, how this affects the existing
forensic analysis and provenance techniques is discussed, and a model for efficient provenance
collection and forensic analysis is proposed.
Keywords: data provenance, conceptual model, cloud architecture, digital forensic, digital
evidence, data confidentiality, forensic requirements, provenance challenges
c 2018 ADFSL Page 47
JDFSL V13N3 A Forensic Enabled Data Provenance Model for Public Cloud
pects of provenance records make it an es- cloud deployment models, the public cloud
sential topic to be discussed in cloud archi- architecture will be the focus as it introduces
tecture. more challenges than the other models. The
Clouds basically use the concept of multi- rest of the paper is organized as follows: In
tenancy and virtualization to ensure efficient Section 2, basic concepts of public cloud, dig-
utilization of available resources. It identi- ital forensic and data provenance along with
fies and solves some challenges of large scale their challenges and requirements will be dis-
data processing, such as on-demand resource cussed. In Section 3, previous work on re-
allocation based on computational require- lated areas will be explored. In Section 4,
ments, distribution and coordination of jobs the proposed data provenance model will be
among different machines, automatic recov- discussed along with additional design con-
ery management, dynamic scaling of oper- siderations. Finally, conclusions and future
ations based on workloads, and releasing work will be discussed in Section 5.
the machines when all the jobs are com-
plete (Amazon Web Services, 2008). These
features make cloud computing a popular 2. BACKGROUND
choice for small and medium scale indus-
tries, and research says this market will ex- 2.1 Public Cloud
pand with a 30% CAGR (Compound An- (Mell, Grance, et al., 2011) defined Cloud
nual Growth Rate) to reach 270 billion by computing as a “model for enabling ubiq-
2020 (Market Research Media, 2016). As uitous, convenient, on-demand network ac-
cloud computing grows in popularity, so do cess to a shared pool of configurable comput-
concerns about security, compliance, privacy ing resources (e.g., networks, servers, stor-
and legal matters (Chung & Hermans, 2010). age, applications, and services) that can be
Storing data in a location with an unknown rapidly provisioned and released with min-
owner’s record with thinner boundaries, and imal management effort or service provider
backing up data by an untrusted third party interaction.” With lots of services offered
are a couple of the notable security con- through advanced technologies, cloud com-
cerns with cloud architecture (Hashizume, puting is redefining computing technology.
Rosado, Fernández-Medina, & Fernandez, Its “pay-as-you-go” and “provision-as-you-
2013). These issues can eventually affect the go” make it the most suitable choice for
trustworthiness of the data, as well as the storing personal information, maintaining
metadata associated with it, making it dif- shared documents with a large group of
ficult to find accurate evidence to apply in users, and hosting large applications that
forensics. Though there are several digital require higher computational power. As
forensic tools to apply in general IT scenar- cloud computing offers wide ranges of ser-
ios, most of them have failed to prove their vices based on demand from a single user to
success over cloud architecture. Thus, new a large organization, different service and de-
digital forensic methods must be developed ployment models have been designed. One of
to meet the challenges of a cloud architec- the widely used deployment models of cloud
ture. service is public cloud, which (Jansen &
In this paper, a provenance model to in- Grance, 2011) describe as an “infrastructure
crease the capability of digital forensics in a and computational resource” that is “owned
cloud environment will be examined. While and operated by an outside party that de-
this model can be experimented on other livers to the general public via multi-tenant
Page 48
c 2018 ADFSL
A Forensic Enabled Data Provenance Model for Public Cloud JDFSL V13N3
platform”. Figure-1 depicts a general ar- the resources. An attacker might be able to
chitecture of the public cloud model where overcome this logical separation using config-
cloud consumers are accessing the infrastruc- uration errors or software bugs to gain illegit-
ture of a cloud provider. imate access. Accessing services over the in-
ternet also increases vulnerabilities through
Figure 1. Public Cloud Model (Bohn et al., various network threats (Bohn et al., 2011).
2011) When an organization manages its own re-
sources and records in its secured computing
111111
.
I
- -mpi -
I
- llf9
- - -=--
I
c 2018 ADFSL Page 49
JDFSL V13N3 A Forensic Enabled Data Provenance Model for Public Cloud
digital forensic process flow. The identifica- (Alqahtany, Clarke, Furnell, & Reich, 2015)
tion process consists of two steps where the identified the challenges based on the gen-
incident and most relevant evidence are iden- eral process model of forensics – identifica-
tified along with their possible correlation tion, collection, organization and presenta-
with other incidents in the system. In the tion. Forensic challenges in cloud computing
collection step, all digital evidence related in this paper will be described referencing
to an incident is accumulated from different both of these models.
sources, preserving their integrity. Later, the Any digital investigation must begin with
records are properly organized by extracting the collection of the digital device to preserve
and investigating the data characteristics. In the artifact related to an occurrence. In
the final stage, a well formulated document cloud computing, volatile memory of a vir-
is prepared by the investigator to present to tual machine instance and the unavailabil-
the court. ity of the virtual images demands creation of
advanced technologies to preserve evidence.
Figure 2. Process Flow of Digital Forensic Logs, the valuable container of evidence, are
(Zawoad et al., 2015) maintained by the cloud providers, but in-
tegrity of this data might be affected by un-
lawful actions. A cloud environment also
imposes restrictions over the control of data
and relational information (e.g., location of
data) which reduces the opportunity of ef-
fective data preservation.
2.2.1 Challenges Once the necessary evidence is preserved
Applying the digital forensic process in a and ready for initial analysis to build a sat-
simple computing architecture is complex by isfactory theory, the incident identification
nature in and of itself; however, when a process can begin. (O’shaughnessy & Keane,
cloud architecture is taken into considera- 2013) described the effect of cloud service
tion, the process becomes even more chal- models in this phase. In both Software as
lenging. The distributed nature of cloud a Service (SaaS) and Platform as a Service
infrastructure and some of its features af- (PaaS), the consumer has limited access to
fect the credibility of the collected artifacts the cloud infrastructure, which leaves the
and eventually increase the complexity of the investigator with a limited option to iden-
forensic process in each of its steps. tify any occurrence directly in the server in-
(O’shaughnessy & Keane, 2013) illus- stance. Infrastructure as a Service (IaaS)
trated some of the issues relevant to the provides better access than the other two
cloud forensic process while describing the models as it allows the client to configure
difference from the general approach of dig- their server with necessary logs. The dif-
ital forensics based on the “Integrated Digi- ferent level of access in the different ser-
tal Investigation Process Model” of (Carrier, vice models eventually hinders the process
Spafford, et al., 2003). Carrier illustrated of building a unified model for incident iden-
the relevance between digital and physical tification and documentation.
investigation and described the digital in- Data collection involves accumulating
vestigation model with five phases – preser- data from different sources like storage de-
vation, survey, search and collection, re- vices, client’s browser history, client’s com-
construction and presentation. However, munication with the service provider and ac-
Page 50
c 2018 ADFSL
A Forensic Enabled Data Provenance Model for Public Cloud JDFSL V13N3
cess information of the cloud and other net- about five major cloud actors with their
work level data. Distributed cloud architec- defined responsibilities and offered services.
tures introduce various challenges for the in- Figure 3 illustrates a cloud architecture fo-
vestigator in this phase. They require ex- cusing on the cloud actors, based on the
tensive knowledge of the cloud architecture, high-level architecture shown in (Bohn et al.,
data storage techniques, extraction mecha- 2011). Each actor has sole or shared respon-
nisms for preserving data integrity and main- sibilities to execute different business trans-
taining privacy of cloud consumers. Also, actions or operations in a cloud environment.
shared storage for managing the logging in- In this scenario, cloud providers offer ser-
formation for multiple enterprises raise con- vices which are negotiated by the brokers,
cern for privacy concern if any incident needs connected and transported by the carrier,
thorough investigation and require collect- used by the cloud consumers and evaluated
ing the complete logging information. Data independently for performance and security
collection also might be affected by inte- by auditors.
gration of inefficient provenance techniques,
lack of time synchronization and inappropri- Figure 3. Conceptual Architecture of Cloud
ate maintenance of the chain of custody.
Collected records are further used in a con-
trolled environment to reconstruct an inci-
Cloud
Consumer r-- '
dent to solidify theory building in the re- Cloud Pro";der
Cloud
Broker
c 2018 ADFSL Page 51
JDFSL V13N3 A Forensic Enabled Data Provenance Model for Public Cloud
responsibilities of the cloud provider [23]. preservation of data and evidence integrity.
Data ownership with stewardship helps in
determining ownership of the records and 2.3 Data Provenance
maintaining a chain in custody in the foren- Data provenance has proved its applicabil-
sic investigation. With a proper backup fa- ity and necessity in different application do-
cility and redundant data storage policy, the mains, data processing systems and repre-
provider can ensure successful data reten- sentation models. For e-science, it can be
tion. Strict data disposal operations also used for transformation and easy derivation
need to be integrated to ensure complete re- of data. In warehousing, it can be used to
moval of data from all the sources without analyze and represent data. For Service Ori-
leaving any option to recover them. Backups ented Architectures (SOA), it can be used
are the only possible means of reconstructing to execute complex computations. In doc-
any event or incident. ument management and software develop-
Cloud providers also need to ensure ment tools, it can be used to identify the
there are controlled access and authoriza- lifecycle of a document. As the nature of the
tion checks for any entry to the facility and large range of applications differs, so does
during data relocation to a different facil- their technique of managing the provenance
ity. In digital forensics, identifying events records.
in proper order helps in incident reconstruc- Data provenance was classified in several
tion. Cloud providers need to ensure that all categories based on the techniques used to
the processes are running under a synchro- trace the records and the way these are pre-
nized environment. Integrating different log- served. The techniques for data tracing are
ging functionality to record every single op- classified under two approaches: “lazy” and
eration is another important responsibility “eager” (W. C. Tan, 2004). In the “lazy”
of cloud providers, which undoubtedly plays approach, provenance records are computed
the most vital role in digital forensic. only when they are required, while prove-
(Ruan & Carthy, 2012) also listed some nance records are carried with the data in
shared responsibilities of providers and con- the “eager” approach. Sequence-of-Delta
sumers based on the (Cloud Security Al- and Timestamping are two alternative ap-
liance, 2011). Defining meaningful taxon- proaches of archiving provenance records
omy, depending on data type, origin, legal or (W. C. Tan, 2004). The Sequence-of-Delta
contractual constraints and sensitivity, helps approach stores the reference version and a
in forensic investigations. All the impor- sequence of forward or backward deltas be-
tant assets need to be recorded with owner- tween versions. On the other hand, times-
ship information, and different data-related tamping uses versions and times to identify
operations need to be logged with common data at various steps of its lifetime.
forensic related terms. For the purpose of fa- Provenance can also be discussed from
cilitating regulatory, statutory and contrac- the points of the affect of the source
tual requirements for compliance mapping, data on the existence of data records, or
elements might be assigned with legislative the locations from which those records are
domains and jurisdiction. Other shared re- fetched (Buneman, Khanna, & Wang-Chiew,
sponsibilities of the provider and consumer 2001). These concepts are termed as why-
include authorization, multi-factor authen- provenance and where-provenance, respec-
tication, establishment of policy and proce- tively. From the aspect of data process-
dure as a part of incident management, and ing architecture, another classification was
Page 52
c 2018 ADFSL
A Forensic Enabled Data Provenance Model for Public Cloud JDFSL V13N3
c 2018 ADFSL Page 53
JDFSL V13N3 A Forensic Enabled Data Provenance Model for Public Cloud
Page 54
c 2018 ADFSL
A Forensic Enabled Data Provenance Model for Public Cloud JDFSL V13N3
c 2018 ADFSL Page 55
JDFSL V13N3 A Forensic Enabled Data Provenance Model for Public Cloud
et al., 2011). It collects provenance infor- ical location of the data. They adopted a
mation in the application layer service like provenance technique in their proposed sys-
PaaS, SaaS and IaaS, rather than just vir- tem where transport layer protocol informa-
tual and physical machine information. It tion was recorded to identify the location
also captures other cloud related essential of data. They also considered the appli-
provenance information like client identifi- cation layer data to answer the additional
cation record and transfer of information queries like who, when, what and how infor-
between different virtual and physical ma- mation which are required for a successful
chines placed in different locations. Addi- forensic investigation. Provenance record is
tionally, DataPROVE also provides efficient proposed to be stored in a centralized log-
data API for indexing and query operations. ging server which eventually can overcome
S2Logger is another approach of maintain- the problem of strong coupling between data
ing secured data provenance through logging and its meta data and can also ensure data
at the block-level and file-level. It is imple- integrity. They proposed another model for
mented based on Flogger (Suen, Ko, Tan, provenance with features like digital object
Jagadpramana, & Lee, 2013). In this mech- tracking and location identification at any
anism, (Suen et al., 2013) addressed features point of the lifetime of that particular digi-
like capture mapping between virtual and tal object (P. Trenwith & Venter, 2015). In
physical resources, proper maintenance of this model, they introduced the concept of a
the provenance chain and data integrity and wrapper object which wrapped a file along
confidentiality. S2Logger also introduced an with its provenance records embedded to-
end-to-end data tracing mechanism that is gether and the location of the wrapper ob-
capable of capturing detail provenance infor- ject would be maintained in a separate cen-
mation from all nodes connected in the cloud tralized server.
environment.
(Lu et al., 2010) have designed a prove-
nance mechanism enabling the forensic fea-
4.
FORENSIC
tures in the cloud environment. Provenance ENABLED
was designed based on a bilinear pairing
technique considering some critical aspects PROVENANCE MODEL
of cloud like data confidentiality, anony- 4.1 Additional Design
mous access of the user, and dispute reso-
lution. This mechanism introduced compu- Consideration
tation and communication overhead at the Designing an efficient and forensic enabled
data owner’s end. This issue was later iden- provenance mechanism in a cloud environ-
tified by (Li et al., 2014), and they have ment requires that some other areas be con-
introduced a broadcast encryption method sidered which vastly contribute to the suc-
to overcome this. They have also ensured cess of the process. Identifying the possi-
anonymity of the user by implementing an ble sources of provenance records, deciding
attribute-based signature (ABS) scheme for on a unified model to store those records,
access control and privacy by designing an data storage with required features for ef-
anonymous key-issuing protocol. ficient data management, and ensuring se-
(P. M. Trenwith & Venter, 2014) discussed curity and integrity of the data for forensic
another source of log information which may operations are important factors that must
allow the investigator to answer to the phys- be considered for a provenance mechanism.
Page 56
c 2018 ADFSL
A Forensic Enabled Data Provenance Model for Public Cloud JDFSL V13N3
General cloud architecture can be divided The format and properties for logging
into several layers – physical resource layer, provenance data collected from different
resource abstraction layer and service layer sources is important in a provenance collec-
as shown in Figure 4. (Ruan & Carthy, tion system. (Marty, 2011) has identified
2012) identified the sources of forensic ar- some important properties of provenance
tifacts from each of the layers. Among data like timestamps, application, user, ses-
these, it is necessary to identify the actual sion id, category, and reason and severity
sources which provide necessary provenance of the event. Applications running in the
related information and help in forensic anal- service layer mostly contribute to this type
ysis. Hard disks, network logs and access of information which can be collected from
records are important sources from the phys- different application and event logs. A dis-
ical layer. Event logs and virtual resources tributed file system, like Google File System
are necessary from the abstraction layer, and (GFS), maintains a separate set of informa-
access logs, events logs, rigid information tion vital for provenance and forensics. The
about the virtual operating system, and ap- GFS master usually keeps information file
plication logs are vital sources from the ser- and chunk (part of a file) namespace, map-
vice layer. (Imran & Hlavacs, 2013) identi- ping from file to chunks, and location of the
fied provenance information in different sub- replica of the chunks (Ghemawat, Gobioff, &
layers of the service layer and developed an Leung, 2003). The Hadoop Distributed File
integrated provenance data collection pro- System (HDFS) also maintains a transaction
cess. In our conceptual model approach of log, called EditLog, in its NameNode which
collecting provenance data, we will also con- contains every modification recorded in file
sider the featured criteria used in these pro- system metadata (Borthakur et al., 2008).
posed systems which can be helpful for the Google cloud maintains region and zone re-
forensic process. lated information while creating a virtual
machine instance recording which has prove-
nance data along with other information
Figure 4. Cloud System Architecture that may guide an investigator in forensic
([(Bohn et al., 2011)]) analysis. The heterogeneous characteristics
of these records make the process of choos-
ing a particular format very hard. Marty has
Service Layer
suggested a syntax of logging the informa-
Saas tion as “key-value” pair, which is considered
as an ideal syntax of logging information in
PaaS
our design approach. We also include a uni-
IaaS
fied naming approach of the “key” to ensure
synchronicity of the same properties coming
from different from sources.
Resource Abstraction & Control Layer
Provenance data records must be eas-
ily searchable; therefore, a data repository
Physical Resource Lay er
that can provide the best performance and
Hardware
an efficient query interface must be chosen.
Facility (Muniswamy-Reddy, 2006) explored differ-
ent database architectures to choose the best
data repository. They considered aspects
c 2018 ADFSL Page 57
JDFSL V13N3 A Forensic Enabled Data Provenance Model for Public Cloud
like the format of maintaining dependency tion will provide a description of the pro-
of the files and different approaches of stor- posed model and surveys its requirements
ing the annotation metadata. They com- of provenance, forensic analysis, and neces-
pared Berkley DB as a schemaless database, sary design considerations. The concept of
MySQL as a RDBMS database, XMLDB as the general actors of cloud operations and
XML database and OpenLDAP as a rep- forensic analysis have been utilized in this
resentative of an LDAP architecture and model; however, the contribution of each of
found that Berkley DB outperformed other the cloud actors and access mechanisms to
database architectures in terms of stan- the provenance records for the investigator
dard database related operations in this sce- and court authority have been redefined. It
nario. While implementing PASS in cloud, is also necessary to introduce a unified log-
(Muniswamy-Reddy et al., 2010) also used ging mechanism in different layers of a cloud
SimpleDB and Amazon S3 to store prove- environment to leverage data manipulation
nance objects. while recording provenance information in a
Provenance records logged in different lay- persistent database.
ers are basically maintained by different ac-
tors of the cloud. Cloud consumers are usu- Figure 5. Forensic Enabled Cloud Prove-
ally responsible for the records maintenance nance Model
service layer while providers manage the log-
ging activities of the abstraction and physi-
cal layers. Provenance records in these lay-
ers may often be affected by the collusion
between the consumer and provider which
eventually can mislead the forensic investi-
gator. The investigator can also play the role
of an accomplice with either of these parties,
which ultimately diminishes the whole ap-
proach of data provenance and forensic anal-
ysis. To ensure integrity of the provenance In a cloud model, consumers and providers
data and trustworthiness of the collection are the two main contributors to the cloud
system, it is necessary to include a verifica- operation, while a cloud auditor basically
tion process which can be used to validate performs analysis on the cloud services and
the provenance information. (Zawoad et al., provides feedback. In the proposed model,
2015) proposed a Proof Publisher Method an additional responsibility for the cloud au-
(PPM) in their Open Cloud Forensic (OCF) ditors is included. Usually cloud services like
which can be used by the court authority to SaaS and PaaS are designed to provide the
verify the forensic reports provided by the consumer with less control over the system
investigators. They also proposed a reliable while IaaS allows configuring the necessary
way of collecting the evidence through se- applications. In these scenarios, a cloud au-
cured read-only APIs accessible only by the ditor’s additional responsibility will be moni-
investigator and court authority. toring whether the services used by the con-
sumers are capable of logging every opera-
4.2 Proposed Model tion or whether a provider enables the log-
Figure 5 shows the proposed Forensic En- ging features for each application where the
abled Cloud Provenance Model. This sec- consumer has limited access. The providers
Page 58
c 2018 ADFSL
A Forensic Enabled Data Provenance Model for Public Cloud JDFSL V13N3
additional responsibility will be introducing time to capture the sequence of events prop-
policies in SLA which will clearly instruct erly. To maintain proper chronology of the
the clients about installing necessary audit- data from different systems, each transaction
ing features. The provider should also re- will be saved with timestamp information for
design the access policies so that the client both the source and data storage system.
can have enough options for customization. This will eventually help in identifying the
The provider should also be capable of mon- unsynchronized nodes or sources.
itoring every operation performed by the Trustworthy data storage policy is another
client. In an ideal scenario, the provider important concern. It is necessary to en-
should install firewalls and other applica- sure integrity of the data and employ tech-
tions to detect intrusions or any form of ma- niques to easily identify any forged data and
licious activity. possible miscreants. To ensure this, the
Database architecture is an important as- database should be designed for capturing
pect of designing an ideal provenance model. necessary audit information. Audit informa-
It is required not only to accommodate a tion will include the application or user iden-
variety of information or manage large vol- tity which eventually stores or update any
umes of data, but also to ensure availabil- information. For all of the data that will be
ity and scalability. Considering these facts, saved an additional attribute will be added
the proposed system is a schema-less dis- as provenance. This attribute will contain
tributed database system to store the prove- the hash value of the data item. This hashed
nance record coming in heterogeneous for- data will be used later for verification of the
mat from different sources. The distributed information.
nature of the database architecture will en- If any issue is reported by any of the other
sure the availability and scalability features. three actors of cloud services, the process of
Data will be stored in a key-value format forensic analysis starts. In this case, an in-
where the key name will maintain a con- vestigator will conduct the preliminary in-
sistent naming convention to facilitate the vestigation by collecting provenance records
query operation maintained through a uni- from the provenance storage and finally pre-
fied data modeling policy. Each transaction pare a report eligible for submission to the
will be logged separately with the reference court authority. The court authority will
to the dependent object. Keeping a single verify the report and provide the verdict. To
instance of the provenance record requires facilitate this process, query and verification
updating the instance very frequently and interfaces for the investigator and court au-
hence reduces the expected performance of thority have been introduced in the proposed
the system. In the case of recording prove- system. These services will be provided by
nance for a file, it is recommended that a the service providers.
copy of the content that will be logged in
the repository be maintained. Provenance 4.3 Discussion
record of a data should also maintain rela- This section briefly demonstrates how the
tionship information with the process or en- proposed model of provenance collection
vironment variables that effected it at any overcomes the challenges already discussed
point of its lifetime. in Section 2 and how efficiently this model
A timestamp with every transaction is a can facilitate the process of forensic analy-
vital field. Loggers in different layers will re- sis. In table-1, we also present a compara-
quire maintaining strict synchronization of tive analysis with the existing approaches in
c 2018 ADFSL Page 59
JDFSL V13N3 A Forensic Enabled Data Provenance Model for Public Cloud
terms of the challenges of provenance man- ity make the provenance collection process
agement for forensic analysis in public cloud a highly challenging task. To ensure effi-
addressed by those models. cient forensic analysis, a provenance collec-
One of the primary concerns about cloud tion technique can compromise none of these
computing is the shift of control of data or concerns. Hence, it is recommended to en-
files to a different authority. Data might be sure a better data management policy. The
affected by malicious attacks, software bugs proposed model suggests a unified naming
or system failure. The distributed nature of convention for “keys” coming from different
cloud architecture can ensure availability of sources. A well-chosen naming conversion
data in these scenarios. Any unlawful access helps to identify the exact information in
from a provider’s end can be easily identified search and can also identify relations in log
through the provenance information, as ev- information coming from different sources.
ery access to any file or data is recorded by Regeneration of data or an incident from
the provenance system. the lineage information is another focus
Allowing multi-tenant facility through vir- of a data provenance system. Provenance
tualization and identifying the location is the records maintained with the actual content
most notable challenge in a public cloud en- can ease this process as an investigator can
vironment. Collecting snapshots of virtual easily extract any verified content from the
machine logs with additional information to provenance storage and apply the opera-
identify the virtual machine instance can tions which occur in the other version of the
solve this problem. File system metadata data to check if these operations generate
contains details about the file status and lo- the same result. This forensic enabled data
cation. Integrating necessary measures al- provenance model also considers the storing
ready discussed in 4.1 can lead to an efficient of the data along with its metadata in the
solution to these issues. Maintaining prove- provenance log.
nance records with location information will This proposed provenance model collects
eventually help investigators identifying the information from different layers of a cloud
location of an object and collect necessary architecture. It also considers necessary
artifacts accordingly. measures to include location information
Forensic investigation is often hampered and other attributes required to identify the
because of the lack of synchronization be- related entities or actions of data. This large
tween stored records and variations of volume of information opens enough oppor-
records coming from different sources. In the tunity for proper analysis.
proposed model, a unified model to identify A loose coupling between data and prove-
attributes of the entity and relevant oper- nance records is maintained as prove-
ations is introduced. This will facilitate the nance information is stored separately in
process of analysis as investigators can easily a database. Separation between these two
identify and differentiate between entities of records ensure the existence of provenance
a particular piece of information and their information even if the actual data is miss-
effect on the provenance record or forensic ing. By nature, provenance information is
analysis. immutable and persistent. If any provenance
Managing the large collection of prove- is deleted from the database intentionally, a
nance data is a difficult job. The large database audit can provide necessary hints
volume of information, variety in data for- to identify the miscreants.
mats and increase in volume in high veloc- The proposed model also prioritizes the
Page 60
c 2018 ADFSL
A Forensic Enabled Data Provenance Model for Public Cloud JDFSL V13N3
Proposed Approaches
(Muniswamy- (Zhang (Li et (P. M. Tren- (Lu Our
Reddy et al., et al., al., with & et al., Ap-
2010) 2011) 2014) Venter, 2010) proach
2014)
Virtualization 3 3
Public
Security & 3 3 3 3 3
Cloud
Privacy
Data Loca- 3 3
tion
Data Preser- 3 3 3 3
vation
Digital
Identification 3 3 3 3 3
Forensic
Collection 3 3 3 3
Scenario Re- 3 3 3 3 3
construction
Presentation 3
Data Cou- 3 3 3
pling
Cross-Host 3
Provenance Tracking
Application 3 3
Independence
Multilayer 3 3 3
Granularity
Data In- 3 3 3 3 3
tegrity
Confidentiality 3 3 3 3
Query-able 3 3 3
Persistence 3 3
Causal Or- 3 3 3
dering
Table 1. Comparative analysis of the proposed approaches
c 2018 ADFSL Page 61
JDFSL V13N3 A Forensic Enabled Data Provenance Model for Public Cloud
thority to detect the possible symptoms of Borthakur, D., et al. (2008). Hdfs
collusion between the consumer and cloud architecture guide. Hadoop Apache
service provider, consumer and investigator, Project, 53 .
or provider and investigator. Buneman, P., Khanna, S., & Wang-Chiew,
T. (2001). Why and where: A
characterization of data provenance.
5. CONCLUSION In International conference on
database theory (pp. 316–330).
Maintaining the provenance information of
Carrier, B., Spafford, E. H., et al. (2003).
data is as important as the data itself. In
Getting physical with the digital
modern day computing technology, it is an
investigation process. International
integral part of data to ensure its authen-
Journal of digital evidence, 2 (2),
ticity and integrity. In this paper, we have
1–20.
identified the challenges of cloud architec-
Chung, M., & Hermans, J. (2010). From
ture, discussed how this affects the existing
hype to future: Kpmg’s 2010 cloud
forensic analysis and provenance techniques,
computing survey. KPMG, 8–28.
and discussed a model for efficient prove-
Cloud Security Alliance. (2011, September).
nance collection and forensic analysis. We
Cloud controls matrix v1.2 - cloud
have also briefly highlighted the necessity of
controls matrix working group.
provenance data visualization and explained
(https://cloudsecurityalliance.org/
why or how it can be helpful for modern day
download/
forensic analysis.
cloud-controls-matrix-v1-2/)
Ghemawat, S., Gobioff, H., & Leung, S.-T.
REFERENCES (2003). The google file system
(Vol. 37) (No. 5). ACM.
Abbadi, I. M., Lyle, J., et al. (2011). Hashizume, K., Rosado, D. G.,
Challenges for provenance in cloud Fernández-Medina, E., & Fernandez,
computing. In Tapp. E. B. (2013). An analysis of security
Alqahtany, S., Clarke, N., Furnell, S., & issues for cloud computing. Journal of
Reich, C. (2015). Cloud forensics: a internet services and applications,
review of challenges, solutions and 4 (1), 5.
open problems. In Cloud computing Imran, M., & Hlavacs, H. (2013). Layering
(iccc), 2015 international conference of the provenance data for cloud
on (pp. 1–9). computing. In International
Amazon Web Services. (2008, September). conference on grid and pervasive
Building greptheweb in the cloud, part computing (pp. 48–58).
1: Cloud architectures. Jansen, W., & Grance, T. (2011). Sp
(http://developer.amazonwebservices.com/ 800-144. guidelines on security and
connect/ entry.jspa?externalID=1632 privacy in public cloud computing.
[September 30 2016]) Kandukuri, B. R., Rakshit, A., et al.
Bohn, R. B., Messina, J., Liu, F., Tong, J., (2009). Cloud security issues. In
& Mao, J. (2011). Nist cloud Services computing, 2009. scc’09. ieee
computing reference architecture. In international conference on (pp.
Services (services), 2011 ieee world 517–520).
congress on (pp. 594–596). Katilu, V. M., Franqueira, V. N., &
Page 62
c 2018 ADFSL
A Forensic Enabled Data Provenance Model for Public Cloud JDFSL V13N3
c 2018 ADFSL Page 63
JDFSL V13N3 A Forensic Enabled Data Provenance Model for Public Cloud
Page 64
c 2018 ADFSL
© 2018. This work is published under
https://creativecommons.org/licenses/by-nc/4.0/(the “License”). Notwithstanding
the ProQuest Terms and Conditions, you may use this content in accordance
with the terms of the License.