Cloud Computing 6

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Digital Investigation 11 (2014) 30–42

Contents lists available at ScienceDirect

Digital Investigation
journal homepage: www.elsevier.com/locate/diin

Cloud Data Imager: A unified answer to remote acquisition of


cloud storage areas
Corrado Federici*
CIRSFID, University of Bologna, via Galliera 3, 40121 Bologna, Italy

a r t i c l e i n f o a b s t r a c t

Article history: The pervasive availability of cheap cloud computing services for data storage, either as
Received 6 November 2013 persistence layer to applications or as mere object store dedicated to final users, is
Received in revised form 5 January 2014 remarkably increasing the chance that cloud platforms potentially host evidence of
Accepted 1 February 2014
criminal activity. Once presented a proper court order, cloud providers would be in the best
position for extracting relevant data from their platforms in the most reliable and complete
Keywords:
way. However, this kind of services are not so widespread to date and, therefore, the need to
Cloud storage
adopt a structured and forensically sound approach calls for innovative weaponry which
Remote forensic acquisition
Dropbox
leverage the data harvesting capabilities offered by the low level program interfaces
Microsoft Skydrive exposed by providers. This paper describes the concepts and internals of the Cloud Data
Google Drive Imager Library, a mediation layer that offers a read only access to files and metadata of
Cloud computing selected remote folders and currently supports access to Dropbox, Google Drive and Microsoft
Computer forensics Skydrive storage facilities. A demo application has been build on top of the library which
Interoperability allows directory browsing, file content view and imaging of folder trees with export to
ISO/IEC 27037 widespread forensic formats.
Virtual volume
ª 2014 Elsevier Ltd. All rights reserved.

1. Introduction investigator, distributed architectures may entail diffi-


culties when it comes to rebuild a global picture as files get
Cloud Computing has made true the long held dream of partitioned in several chunks of configurable size and are
computers as affordable utilities (Parkhill, 1966) which are scattered among a potentially vast population of partici-
charged according to their usage. In this respect, a key role pating nodes (Quick and Choo, 2013a). This most probably
has been played by distributed file systems and object prevents forensic teams from utilizing write blockers and
stores, which allowed to reach virtually infinite storage bit stream copiers because it is hard to detect which of the
capacity by summing the individual contributes of the disks plethora of nodes hold relevant data without digging into
placed inside commodity servers. Well known solutions file system internals. But this is regrettably just a part of the
exist, either proprietary or open source, that ensure high story: proprietary technologies, unavailability of the pro-
availability and geographic distributions of data. A side vider to deliver a console with root privileges to third
effect of a reliable and cheap storage area is the remarkably parties or simply lack of jurisdiction help figure out why an
increasing chance that it can be used for harboring crime on field approach may simply be totally unfeasible. So the
related data, such as credit card numbers, stolen identities natural conclusion should be serving a warrant to cloud
or violated credentials. Unfortunately for the digital providers as, in principle, they would be in the best position
to extract relevant data from their platforms. While this
approach seems straightforward and rid of troubles, relying
* Tel.: þ39 051 2098771.
E-mail addresses: corrado.federici@unibo.it, corrado.federici@
on a party that does not natively offer a professional
carabinieri.it. forensic service, requires that a good deal of trust be placed

http://dx.doi.org/10.1016/j.diin.2014.02.002
1742-2876/ª 2014 Elsevier Ltd. All rights reserved.
C. Federici / Digital Investigation 11 (2014) 30–42 31

on procedures and tools used at provider’s premises and foremost, forensic best practices, where possible, sug-
(Dykstra and Sherman, 2012). Data should be delivered to gest avoiding alteration of digital evidences (DE) during
forensic investigators in a well known format, as complete acquisition. Therefore a read only access to cloud storage
as possible, integrity protected and non repudiable. areas which mimics the write blocking mechanism applied
Consider however the following scenarios where data ac- in traditional bit stream copy of physical mass memories
quired as a result of a warrant could potentially be deemed would be beneficial. Indeed, Application Program In-
unacceptable before a court for lack of reliability or terfaces (API) do allow write access: upload, deletion and
sufficiency: copy of objects are possible by design. Furthermore, while
REST web services seems somehow the ‘lingua franca’ for
 a system administrator without a specific forensic interacting programmatically with remote storage, the
background uses an ordinary maintenance script to parameters that need to be specified in the calls may vary
restore the requested data from a backup. As a result, greatly from one platform to another and so do the format
content gets extracted, but some file metadata are of returned data. An extra layer which harmonizes the
overwritten; syntactic differences is therefore needed. Not less impor-
 deleted files are not recovered, even if this was techni- tant is the requirement of protecting the integrity of all the
cally possible; retrieved data and reporting all operations in a detailed log.
 once packaged, the blob gets delivered without integrity With this foreword, the paper describes the concepts and
protection codes or it is impossible to uniquely associate internals of the Cloud Data Imager Library (CDI Lib), a
it to the provider because of flaws in the chain of mediation layer we developed to offer a read only access to
custody; files and metadata of selected remote folders, while pre-
 in case of proprietary templates, raw data is not expor- senting a unified front end which masks out the syntactic
ted in a well known format and browsing is only and functional differences of cloud technologies. We built a
possible by means of a viewer program. desktop application on top of the library which, once
instrumented with the necessary credentials, provides
functionalities like folder listing with view of present,
Resorting to the scrutiny of a third party appointed as deleted and shared content, browsing of file revisions,
needed to audit and certify the operation might result in extensive logging and imaging of folder trees with export to
additional costs and possibly further delays. Agreeing be- widespread forensic formats. CDI Lib currently supports
forehand on an acceptable strategy for acquisition of data access to three popular storage facilities: Dropbox, Google
between law enforcement (LE) and provider could translate Drive and Microsoft Skydrive.
into delays as well and might need to be redesigned when The rest of the paper is organized as follows: next
the counterpart changes. When a provider assisted Forensic paragraph reviews previous and related work. Section 3
As a Service (Dykstra and Sherman, 2012) is not available, a gives some background information and Section 4 lists
third way may be considered that is secure, officially sup- the requirements applicable to cloud forensic software.
ported and reduces the point of contacts with the cloud Section 5 shows the limitations of currently available tools,
provider so to possibly shorten times and lower costs. whereas Section 6 deals with CDI’s architecture and func-
Given the self service nature of cloud platform, object tional tests. We draw the conclusions in paragraph 7.
storing is also exposed via entry points that usually
reproduce all the features available from a web console. A 2. Related work
low level interface based on Simple Object Access Protocol
(SOAP) or Representational State Transfer (REST) web ser- Plenty of work has been developed about discovering
vices enables user created applications to remotely execute traces left on client devices by the interaction with cloud
operations on folders and files such as download and list. storage platforms. For instance, Chung et al. (2012) have
Higher level Software Development Kits (SDK) are often devised a procedure to collect remnants from computer
available that wrap Hypertext Transfer Protocol (HTTP) and smartphones accessing, among others, Amazon S3 and
calls and allow a programmer to rely on languages like Java. Google Docs and found that many artifacts can be recov-
Reasonable scopes of application include, but are not ered by digging into logs, cache files and databases present
limited to, technical activities performed during pre-trial in a user profile. In two consecutive papers, Quick and
hearing with or without the consent of the defendant. In Choo, (2013a, 2013b) accomplished a comprehensive
the first case the defendant willingly gives his credentials analysis concerning traces recoverable in memory and
as he may have interest in taking a trusted snapshot of his persistent storage of a Windows PCs and Apple iPhone after
cloud stored files without any modifications. In the latter Dropbox and Microsoft Skydrive services were accessed via
scenario, by performing a forensic analysis of a seized browser or client applications. A similar research was
computer law enforcement could have recovered user accomplished for Amazon Cloud Drive (Hale, 2013).
name and passwords of a storage account (Quick and Choo, Conversely, procedures and tools for server side acquisition
2013a,b) or directly an access token string (AT) so to bypass of file content and metadata from a cloud object store ap-
user authentication, as it might be possible for Dropbox pears to deserve a far larger degree of deepening. Quick and
(see Section 6.1.1. and 6.1.2). While the approach of remote Choo (2013c) have explored the possibility of collecting
acquisitions seems promising, there are some aspects that files from a user account of Dropbox, Google Drive and
need to be deepened before blueprinting strategies and Microsoft Skydrive. As a preliminary consideration, the
tools able to image data in a forensically sound way. First authors observe that their investigation lacked a suitable
32 C. Federici / Digital Investigation 11 (2014) 30–42

forensic software for the collection of data. As a conse- manifest content and may prove very relevant for computer
quence, their findings are somehow limited by the need of forensics. For instance, trash and past revisions of a file may
using an internet browser or the official client application. be available via remote query, so their accounting may be
Indeed these are general purpose tools which were not crucial to an investigation. Finally, not less important, is the
designed with forensic principles in mind and, as the au- possibility to retrieve objects metadata: some ‘data about
thors themselves observe, may not leave traces in client data’ may be extremely valuable because they are less under
devices of precious information such a historical versions of user’s control. Consider the case in which an examiner could
files. Considering then the circumstance that one of the be led astray by a clever suspect that conveniently tweaks
ends of the communication is under the control of the re- file timestamps on his laptop to support his claims.
searchers, much more could be put in place than capturing Conversely, uploading a new version of a document to a
SSL encrypted network traffic. However, one of the out- cloud store, for example as a consequence of a folder auto-
comes of their research is an important starting point of the matic backup, may retain its modification time, but updates
present work as they determined that there were no the creation time to the current remote platform system
changes in contents after having downloaded a file, while time so to possibly generate contradictions between the two
only some of the timestamps were preserved. The Cloud timestamps. This is likely synchronized with a time server
Data Imager project just aims to fill the gap outlined by and is thus expected to be a much more reliable landmark
work of Quick and Choo. A dedicated forensic software than a notebook clock. For completeness sake, it has to be
could log the full conversation with the cloud platform at added though that policies on object timestamping may
application level and in clear text, having if anything the vary from a cloud provider to another and therefore they
issue to protect user credentials, access and refresh tokens need to be evaluated case by case.
(RT). Furthermore, cloud APIs have provisions for retrieving
the metadata of all items in a folder and this is crucial to set 4. Requirements
creation and modification times of downloaded files equal
to the one hosted in the cloud storage area. A forensic software, no matter how innovative, needs
to comply with concepts and procedures set forth by
3. Background relevant regulations and best practices. In this respect,
this work relies on the guidance offered by ISO/IEC 27037
Object stores are very popular facilities these days. They (International Organization for Standardization, 2012).
allow reliable persistence of user’s content like documents This international standard contains guidelines directed
or pictures thanks to a sparse architecture able to massively to all the professionals who need to confront with po-
scale and tolerate component failures (Openstack, 2013). tential DE, from identification to evaluation in a tribunal,
Their native interface is based on web services that allow needing to grant that this material comply with the
interaction with objects in their entirety: for instance, it is generally accepted principles of relevance, reliability and
not possible to modify an object by writing a defined sufficiency. The standard deals with four phases: identi-
amount of bytes at a certain position as allowed by tradi- fication (DE search and recognition), collection (removal
tional Portable Operating System Interface (POSIX) seman- of DE from its location), acquisition (creation of a copy)
tics. As a result, a modification of a document requires and preservation (safeguards aimed to avoid that DE is
uploading a new version of it and hence derives the prop- tampered with, damaged or dispersed). Even if these
erty of immutability (Google, 2013b). For ease of use, using a guidelines does not explicitly deal with cloud storage
provider distributed application, storage areas can also be services yet, we can consider them as delivered by non
replicated with a two way synchronization on the file sys- interruptible mission critical systems which can be
tem of user’s device and mounted like a regular local folder. reached only remotely and cannot be acquired in their
An important aspect of a forensic investigation is the pos- entirety because of their size. Under these conditions,
sibility for an examiner to find remnants of past activity clauses 5.4.4, 7.1.3.3 and 7.1.3.4 of ISO/IEC 27037 state
which are not immediately manifest, not only in the final that a logical partial acquisition which targets specific file
user’s computer or tablet, but also in the cloud infrastruc- and directories is admissible. Focus will be on the two
ture itself. Data can be consumed from a variety of client last phases listed by the standard: digital evidence copy
devices which might be unavailable for an inspection and and preservation. Identification is considered accom-
may change, be updated or erased: with its outstanding plished a priori as an outcome of an investigative activity
capacity of durability the cloud might be the only anchor of a leading to pinpoint the relevant cloud accounts. Collec-
case. In this respect, object stores usually feature trash tion is not applicable: there is no digital evidence to
containers in which items are put after deletion (Google, remove from its location as everything is accomplished
2014b). Depending on the quality of service subscribed, via a network. The following paragraph lists the re-
these can enforce temporary or long term persistence, until quirements that should be fulfilled by a cloud storage
users decide for a permanent erasure. Trash is a system forensic application (FA).
folder to which items are transparently moved awaiting
their fate. Equally remarkable for the forensic examiner is 4.1. Logical acquisition
the possibility of the cloud platform to keep track of past
versions of objects after their are updated by the user. The Forensic software should leverage provider delivered
programming interface which exposes a storage area may programming interface that allows unabridged retrieval of
have functionalities that go far beyond the retrieval of file content and metadata irrespective of cloud platform file
C. Federici / Digital Investigation 11 (2014) 30–42 33

system technology. An extra effort may be necessary to obtained by calling a method named metadata belonging to
request by other means data which were not available class DropboxAPI. This time the returned list of structures,
remotely as in the case of access logs, deleted items or namely DropboxAPI.Entry, would include a field named
historical versions of documents. Should the provider is_deleted which however is not assigned because metadata
expose both clear text and encrypted endpoints, the FA has no input parameter to request the inclusion of deleted
should rely on the latter to ensure confidentiality of com- items in the returned list. Getting those items from Drop-
munications. In this respect, Secure Socket Layer (SSL) is a box could be possible as they live for 30 days for unpaid
ubiquitous protocol. accounts and forever in case of paid subscriptions. It is
therefore necessary to leverage the REST web services
4.2. Performed functions interface, which is the foundation of every higher level
SDK: invoking a GET method with an include_deleted
The module responsible for the communication with parameter equal to true returns a JavaScript Object Nota-
the cloud platform should implement a minimum set of tion (JSON) formatted list which includes deleted folder
functions which allow the following operations: and files. It was the only way we were able to achieve this
result.
 user authentication and authorization;
 retrieval of user information like name and ID; 4.4. Read only access
 retrieval of folder metadata with its existing subfolders
and files. Deleted items should be obtained as well, if Conforming to the principle of reliability of digital evi-
possible. Metadata should include at minimum: name, dence, cloud stored content and metadata should be
size, creation and modification date. secured against any accidental change. This translates into
 listing of all available revisions of file, if available; the requirement for the FA to access remote content in a
 file content download; read only manner. Indeed, similar to traditional bit stream
 retrieval of directories and files that someone else copies of digital evidences, best practices advise, if possible,
shared with the user, if this information is provided. to implement a write blocking mechanism to avert the risk
of invalidating an acquisition. This cannot always be guar-
As method invocation syntax varies from a provider to anteed in case of usage of internet browsers and provider
another, this module should provide a harmonization layer delivered client applications because they could possibly
which exposes a unified set of calls so to mask out possible cause accidental modifications to storage areas. Other av-
differences like Uniform Resource Locator (URL) composi- enues for alteration of the remote content which could be
tion, input parameter list or formatting of returned data. performed by third parties must be discussed with the
provider and eliminated, if possible. Example solutions
could include:
4.3. Low level interface

 a new account released to LE with exclusive access to


For completeness sake, developers should exercise care
suspect’s storage area;
so to select the API that allows to retrieve the maximum
 if suspect’s recovered own credentials are used, exclu-
amount possible of information. This may mean accessing
sive login could be granted only to LE’s forensic work-
the platform at the lowest possible level and may require a
station or write permissions could be removed by the
larger degree of development effort, but at same time it
provider from the account.
involves remarkable paybacks:

 augmented control, which translates into the possibility


of retrieve more potentially interesting information 4.5. Officially supported interface
from the cloud platform;
 the possibility to develop an application using languages The forensic application will be mandatorily based on
for whom no SDK exist. stable and officially supported API. Under no circumstance
programming interfaces offered by third parties can be
For example, most providers publish SDKs with high leveraged, if they did not received a prior endorsement by
level classes that wrap REST web services calls and greatly the cloud provider. This strongly excludes for example
ease the life of programmers by reducing the amount of function calls which were obtained by reverse engineering
code necessary to perform operations on the data store. of code or via protocol inspectors.
Consider the case of Dropbox: the latest core Java API to
date is version 1.7.3. Invoking a method called getMetada- 4.6. On demand folder browsing
taWithChildren from class DbxClient, which accepts the path
of a folder as input, a list of entries of type DbxEntry.File or Conforming to the principle of sufficiency of digital ev-
DbxEntry.Folder is returned, that represents all the files and idence, the FA should offer the possibility to browse an
subfolders contained within. Each entry is a data structure account in order to possibly exclude from imaging those
which does not contain a flag to inform if the folder or file folders that appear clearly irrelevant. To avoid unnecessary
has been deleted. The same happens with the latest Java network traffic, instead of walking the whole directory tree
API for Android version 1.5.4, where folder listing can be beforehand, metadata can be retrieved and cached only
34 C. Federici / Digital Investigation 11 (2014) 30–42

when examiner’s navigations requests it. The chance of constitute an added value over analyzing physical copies
performing a prior selection is also important for triaging of disks: for instance, in a New Technology File System
data in case of very large stores that cannot be wholly ac- (NTFS) formatted volume of a Windows 7 box previous
quired in the allowed timeframe. File content preview versions of a file are recoverable only if a user made a
should be possible also for deleted and previous versions, if backup or if System Protection is enabled to allow
available. The on demand nature of folder browsing ex- shadow copies be created. Even in this case, only the copy
cludes a blind synchronization of the data store with a local which was present just before the restore point creation
folder when the FA starts. will be available.

4.7. Native logging 5. Evaluation of available tools

Clause 5.3.2 of ISO/IEC 27037 states the importance of After a literature perusal and due technology scouting, it
documentation to allow an independent assessor to eval- appears that the arsenal for remote data retrieval of cloud
uate all actions performed. To meet this requirement, the storage areas is not very populated. General purpose tools
forensic software should therefore support a logging facil- such as internet browsers or provider delivered client ap-
ity of configurable verbosity to create an audit trail for all plications can be useful allies in an investigation and can be
actions. All relevant events stemming from the interaction certainly used if they produce the expected results, but we
with the cloud platform, user actions or error conditions need to stress an incomplete compliance to some of the
should be accounted for. The times should indicate the shift above requirements, for instance read only access or native
from Coordinated Universal Time (UTC), if any, for disam- logging. They are not forensics applications indeed, but
biguation purposes. If possible, at the end of operations, the were designed to allow a convenient read-write access to
log file should be hashed and timestamped via certification users, so missing functions need to be provided externally
services delivered by a legal authority in order to locate it in by other software. Provided that he is able to justify the
a defined point in time. Due care should be observed in reason for his actions, the investigator is not bound to
protecting sensitive information like user credentials or specific tools, but it could be beneficial to the quality of
restricted configuration parameters. For instance, when a technical assessments relying on instruments able to in-
request of listing the content of a folder is issued, access crease the level of auditability and justifiability. The
tokens used for authorization get recorded. Protection following table reports a compliance test to the re-
could be enforced by creating an unabridged master copy of quirements of Section 4 in case of access by means of
the log file which is stored securely and a working copy to browsers (Microsoft IE10 and Mozilla Firefox 25.0.1) and
be delivered to trial parties where sensitive data are desktop applications for Dropbox (version 2.0.22), Google
masked off. This way original information can be accessed Drive (release 1.12.5329.1887) and Microsoft Skydrive
in a controlled manner should the Court deem this neces- (build 17.0.2015.0811). Although coarse grained, this test
sary. An efficient logging methodology could avoid the brings some food for thought.
usage of extra recording facilities like screenshots, video Table 1 shows that desktop clients, despite allowing a
footages or secure HTTP protocol decoders as Telerik’s convenient navigation on a local copy of data, do not show
Fiddler web proxy (http://fiddler2.com/), which is able to objects that were revised or deleted, do not preserve item
decrypt protected traffic with a man-in-the-middle creation time and perform a blind synchronization,
approach. possibly including irrelevant material. An internet browser
has more to offer from a forensic point of view as it can
4.8. Folder imaging download selected folders as compressed files or show
deleted items and revisions (in Skydrive the latter are
Once the examiner has selected the relevant folder, available only for Office documents). However, both tools
the FA must faithfully traverse the complete tree so to allow modification of target items and do not provide
copy remote data into logical evidence files and should dedicated logging or integrity protection via hash codes. As
compute integrity protection codes to avert the possibil- a part of its forensic products offer, F-Response (https://
ity that the evidence is tampered with after it has been www.f-response.com/) delivers a Cloud Connector which
acquired. For instance, a cumulative cryptographic hash enables one to mount a cloud storage platform as a local
of all data retrieved by the server could be calculated and logical volume or network share. Unfortunately, a copy of
an accompanying file hash list could be a valuable addi- the software is not available for tests, but it is anyhow
tion. The output format of the image may vary: it could be worth noting that, according to the public available manual
for instance a database or a local folder that reproduces rel. 5.0.1, it is guaranteed a write protection mechanism.
the structure of the cloud area. In the latter case, meta- This is compliant with clause 4.4 and is very important
data information can be put in a text file inside its cor- from a forensic standpoint. However, performed functions
responding folder: these are the trusted origin of seem not to include the possibility to retrieve deleted
information about files and folders, in particular when items, past versions of files or shared folders. These fea-
copied items do not preserve some fields such as creation tures, according to contacts with the company, will be
time. The crucial aspect of imaging operations is that file provided in future releases. Differently from the provisions
contents and metadata of the target folder and subfolders of clause 4.8, Cloud Connector does not directly perform
be copied in their entirety as received by the server, folder imaging, but rather prepares the ground for a third
including trashed files and revisions. The latter may party product.
C. Federici / Digital Investigation 11 (2014) 30–42 35

Table 1
Compliance test to requirements of Section 4 for browsers and client applications.

Requirement Client Browser Notes


Logical acquisition Pass Pass Both tools allow to create local copies of remote files and folders
Performed functions Fail Pass Browser access is compliant with all requested functions, conversely no deleted items and previous
versions of files available with all desktop clients. No shared contents available for Google Drive and
Microsoft Skydrive clients
Low level access Fail Pass Considering the rationale of retrieving the maximum amount possible of information, desktop
clients fail as they do not show versioned or deleted files, which is instead possible with a browser
access
Read only access Fail Fail Write access is granted for both tools
Official interface Pass Pass Desktop clients are delivered by cloud providers themselves and browsers access provider web sites
Folder browsing Fail Pass Browser allow on demand folder navigation. Client applications imply prior blind synchronization of
the remote data store, which may include irrelevant data
Native logging Fail Fail Neither tool logs communications with remote servers with the needed detail. Extra recording tools
can be put in place, such as screen captures and video recording of operations. For browsers, tools
like Telerik’s Fiddler web debugging proxy can be leveraged which decodes secure HTTP traffic
Folder imaging Fail Fail Both can acquire a whole directory tree, but some folder metadata is not preserved. No hash
functions are used to protect integrity for both tools.

6. Architecture and functions interface based on secure HTTP requests and the usage of
Open Authorization (OAUTH) version 2.0 as a protocol for
Cloud Data Imager is a novel forensic tool for the remote authorization (Hardt, 2012). However, there are important
collection of data from cloud storage accounts which fulfills differences as well: URLs to which direct requests, methods
all the requirements listed in Section 4. Fig. 1 shows the and syntax for parameter passing and data structures
global architecture and functions. returned in server answers. After being initialized with a
The two main features are directory browsing with provider ID, CDI library hides this lack of homogeneity and
visualization of file content and logical copy of a selected publish a unified set of calls irrespective of the underlying
folder tree, not necessarily the root, to a local repository. cloud technology. This enforces interoperability among
Access to Dropbox, Google Drive and Microsoft Skydrive cloud platforms and greatly simplifies the development of an
platforms is currently implemented, but development application built on top of the library as a distinction among
plans include support for other ‘Storage as a Service’ facil- providers is made only once in the part of code that handles
ities either public such as Amazon S3 (http://aws.amazon. initialization. The available functions are listed in table 2.
com/s3) or private like Openstack’s Swift (http://www. It can be seen that requirement 4.2 is satisfied along
openstack.org). The tool features a library which mediates with 4.3, because CDI leverages the REST web services
between an overlying application and the provider exposed interface which is at the lowest possible level. Read only
programming interface.
Table 2
6.1. CDI Library List of calls exported by CDI library.

Name Category Function


The APIs exposed by all the currently implemented
getAuthorizeUrlV2 Authentication and OAUTH 2.0. Retrieves
platforms have some important commonalities such as an Authorization the URL to be addressed
by a browser to let user
authorize access to
cloud account. If user
authorizes, page
returns an
authentication code
getAccessTokenV2 Authentication and OAUTH 2.0. Exchanges
Authorization the authentication code
in the authorization
web page to an access
token string to be used
in subsequent service
request
getAccountInfo Information Retrieves user name
and ID of the account
holder
listFolder Browsing and Retrieves metadata of a
Imaging folder including its files
and subfolders
listFileRevisions Browsing and Retrieves metadata of
Imaging all previous revision of
a file
getFileContent Browsing and Gets the raw content of
Imaging a file
Fig. 1. CDI architecture and functions.
36 C. Federici / Digital Investigation 11 (2014) 30–42

access is then guaranteed as all methods which handle For this reason CDI library keeps AT only in volatile memory
users data are based on HTTP GETs. The library was and, as stated in paragraph 4.7, an unabridged master copy
developed after browsing the official literature published of the log file is stored securely whereas another copy for
by Microsoft (REST reference Live Connect, 2013), Google the parties is created, where asterisks are put in place of AT
(Google, 2013a) and Dropbox (Dropbox, 2013a). Finally, characters to avoid sensitive information leaks. Listing of a
operations get logged in a text file with a configurable folder items and file revisions has an upper bound. The
verbosity level which defaults to DEBUG, the most com- former defaults to 10,000 and the latter to 10. An error code
plete. This means that all the requests and responses by will be returned for listings containing a number of files
functions listed in Table 2, saved getFileContent to avoid exceeding the limit. Accordingly, listFolder and listFileR-
excessive space consumption, are recorded verbatim as evisions will return no more than 25000 and 1000 files
issued to or received from the cloud platform. The CDI which are the topmost listing limit. Dropbox’s allows
project is written in C# for .NET framework 4 or higher and retrieval of deleted and revised items: unlimited deletion
requires Windows 7 or later. It leverages Json.NET package recovery and version history is granted to paid accounts
(Newton-King, 2013) to parse JSON formatted server an- whereas this ability is limited to 30 days for the free ones
swers along with Log4net (The Apache Software (Dropbox, 2013b). Concerning the metadata returned by
Foundation, 2013) as a logging facility. Dropbox, Google the API for deleted files, it has been verified that their size is
Drive and Microsoft Skydrive require that a developer zero bytes and client_mtime, the original file modification
register to get an app key and secret. These credentials are time which is retained if the file is uploaded with Dropbox’s
leveraged by CDI library authentication and authorization desktop application, is invariably set to Dec 31st 1969,
functions. 23:59:59 þ0000.

6.1.1. OAUTH 6.1.3. Google Drive


The OAUTH authorization framework version 2 is the de Google Drive has a more flexible authorization scheme
facto standard for authorizing access via web services to a with a granularity ranging from full read/write permissions
restricted resource on behalf of a third party. Differently on all user files to single per file access. CDI library lever-
from traditional client-server scenarios, a client application ages ‘https://www.googleapis.com/auth/drive.readonly’ scope
is not aware of the credentials of the owner: it is issued a which grants read only access to all files and metadata thus
temporary token to access the resource after the owner has giving further assurance that user data are not modified in
logged on an authorizing server and explicitly accepted the any way. Access tokens have a typical lifetime of 1 h and are
access scope requested by the application. Scopes may issued along with refresh tokens because CDI library re-
include for instance the permission to modify an entire quires an offline access type in order to carry possibly
directory tree or just a single folder and the ability to lengthy calculations such as folder imaging. Google Drive
operate when the owner is not logged on the storage keeps track of file versions, but this requires space. So they
platform. Access tokens are character strings of variable are deleted after 30 days or if there are more than 100 re-
length issued by authorization servers which need to be visions of a file (Google, 2014a). However, the user can
attached to authenticate every request to resource servers. decide to avoid this auto deletion policy on a per file basis.
They are usually short lived: 1 h is a typical value. To allow Listing of a folder defaults to 100 items and listFolder will
offline operations when the user is not logged in, a refresh stretch to the upper bound of 1000 items. Conversely, there
token may be released to applications along with the access are not input parameters which limit the number of
token after a user has given his consent. RT’s goal is to be returned revisions of a file, even if the default number will
presented to authorization servers only to acquire a new AT be 100 as per deletion policy.
so to extend admittance without requiring user interven-
tion. In this respect, CDI library keeps track of AT lifetime 6.1.4. Skydrive
and, prior to expiration, leverages RTs to transparently Microsoft Skydrive features an even more comprehen-
renegotiate the issue of a new one. This offloads the sive authorization scheme which is able to give separate
application from handling the renewal of credentials from permissions to user’s profile, contacts, calendar, multi-
time to time. media content or more generally to files. CDI library uses
the following scopes:
6.1.2. Dropbox
Dropbox’s authorization scopes include read/write  ‘wl_basic’: to get user’s name ad ID;
permissions on either a dedicated folder or the full user’s  ‘wl_contacts_skydrive’: to obtain read only access and
dropbox. It’s a coarse grained permission scheme as, for retrieve metadata and content of all folder and files
instance, it is not possible to get read only access to all files belonging to the users or shared by others;
and folders, even if content immutability still stands thanks  ‘wl.offline_access’: to operate also when the user is not
to http GETs usage made by CDI library. Differently from signed in via the refresh token mechanism.
other providers, access tokens have not an expiration time:
they can be used in the long term until user repeats Once the user has authorized, scopes are cached in the
authorization process or explicitly revokes them. This ‘App and services’ tab of his account and need to be
means that once stored on a durable medium, a poor pro- explicitly revoked in case of need. Deleted files are sent to
tection policy of the AT could lead third parties to bypass the recycle bin and kept for at most 30 days in case of free
user authorization and access his data without further ado. accounts. Permanence in the bin depends on its size: if it
C. Federici / Digital Investigation 11 (2014) 30–42 37

reaches 10% of the storage capacity files are removed button. Once written in the configuration file their confi-
earlier, but not before three days after deletion (Shahine, dentiality is protected leveraging Effortless.NET encryption
2012). Version history exists, but they are available for library (Effortless.Net Encryption, 2012) with a 256 bit key
Microsoft Office documents only. The most severe limita- generated from a user chosen passphrase. After a cloud
tion from a forensic standpoint is the lack of API functions provider has been selected, a new work session can begin
that expose the recycle bin and previous releases of a file. and a check is performed against the presence of a valid AT
This somehow weaken the power of remote collection in the configuration file. Recalling the introduction, this
tools, even if the benefits of ensuring a write protection, could be when a technical activity is carried out bypassing
producing an audit trail and safeguarding integrity via user consent when investigators have to enforce a court
hashes is still rather valuable. The API call for listing folder order. In this case, an AT can therefore be directly obtained
contents has no input parameter limiting the number of by the provider, but must not have a limited lifetime
returned children. because no out of band refresh token is expected in the
configuration file. If an AT is unavailable, the whole OAUTH
6.2. CDI Application 2.0 process is started: a browser session is initiated which
requests authentication to the selected cloud platform. Once
The dashboard is divided into three functional areas as user has successfully logged in, he is presented an authori-
depicted in Fig. 2: a central area with a tree and list view for zation page which states the access scopes requested by the
navigation purposes, a left panel for provider selection and application. Once the user has accepted, his ID is retrieved
an upper zone for information. The tree shows the selected along with the content of the root folder, which is named
folder only, whereas the columns in the list show items after the user (‘seminario tenerife’ in Fig. 2) and the content
Name, Size, Modification and Upload date. The upper right of the items which have been shared with the user (‘shared
zone details diagnostic information as logged by the appli- items’ in Fig.2). This latter is not shown for Dropbox because
cation, starting from the selection of a provider: every action shared folders appear as root children. From now on the
such as folder listing or visualization of a file is recorded in a usual explorer like navigation can begin. As stated in para-
session log whether successful or not. In the latter case the graph 4.6, the metadata of files and subfolders are retrieved
error code returned by the cloud platform is written as well. only on demand and then cached.
There are a few configuration parameters, some belonging A file can be visualized by double clicking on it. This
to a common part and some differentiated. In particular, action has the file downloaded, saved in a configurable
there are two lines in which application key and secret are working directory and opened with the associated viewer.
recorded. This is because Cloud Data Imager is in beta Deleted files, if supported, are marked with a red X and are
version to date and has not been yet endorsed by cloud viewable as well. Also file versions, if available, are viewable
providers which may enforce limits on access for applica- with a right click on a selected file and their icon has the left
tions not yet ready for production. For instance, Dropbox side filled with red. The list is displayed in a separated
allows at most one hundred concurrent accesses for appli- window, with the newest release on top, which shows a
cations still under development, so the author cannot revision ID (Fig.3).
embed his credentials for widespread diffusion yet. There-
fore, CDI users will have to register a fictious application for 6.2.1. Imaging a directory tree
every provider for which no official approval exist and get With a “right click and confirm” on a folder in the tree
application credentials. These need to be input only once for pane the user can unleash the imaging process which, in
provider in the edit boxes just above the blue ‘Authorize’ accordance with paragraph 4.8, entails a logical copy of

Fig. 2. Particular of CDI dashboard.


38 C. Federici / Digital Investigation 11 (2014) 30–42

Fig. 3. An example of Dropbox file revisions list.

metadata and content of every subfolder and file. This is a and modification timestamps of every recreated item are
three stages run: set equal to the original, but this is just for reader’s ease.
In case of doubt, trust should be put only in the content
1. creation, format and mount of a virtual hard disk (VHD) of xxx_metadata.txt files. For all data received by the
which will host the logical image. VHD is a specification server, two cumulative message digests are calculated
made public by Microsoft under its Open Specification with MD5 and SHA1 cryptographic functions and
Promise (Microsoft, 2013) for encapsulating a volume in recorded in the log. These will be known as inner hashes
a file. By leveraging a hidden instance of the Diskpart and will be checked against the content of all down-
Windows utility, CDI creates a fixed virtual hard disk, loaded files plus xxx_metadata.txt files. An MD5-SHA1
whose room requirements are calculated in a pre- hash list of all retrieved files is also created. To verify
liminary phase according to cloud storage size plus a 20% inner hashes, the imaged folder needs to be traversed in
margin. In any case, the minimum volume size is 512 MB the exact order of this list otherwise there will be not
to keep low the percentage of sectors requested by file match;
system service structures compared to space available 3. At last, the fixed virtual disk is dismounted and the .VHD
for data. The virtual volume will be NTFS formatted to file is converted to raw format by removing a 512 bytes
overcome FAT32 4 GB file size limitations and it is footer. It is then exported in the Guidance Software’s
possible to decide whether to perform a quick or a full Expert Witness Format (EWF), one of the most widespread
format. At last the volume will be mounted and assigned forensic container to date, silently running the ewfacquire
the first available drive letter from H to Z; tool from the libewf (Metz, 2013) project. EWF is a com-
2. At this point the whole remote directory structure is pressed format so the outcomes are .Exx files whose sum
recreated on the mounted drive. For each folder, a text can be much less in size that the virtual disk. EWF has
file named "$cdi$_metadata.txt" is created containing provisions for editable additional information such as
the unabridged server response to the listFolder call. For case number or notes. Values of inner hashes are
each file owning at least one historical version, a text file appended to these notes for examiner’s convenience.
named after it is created to whom the suffix "_$cdi$_r- Ewfacquire will also include in the container an MD5 and
ev_metadata.txt" is appended. It again contains the a SHA1 message digest calculated on the whole virtual
complete answer to the invocation of the listFileRevisions volume so to preserve its integrity. This includes NTFS
function. These additional files are created by the im- service tables and files and therefore will be definitely
aging process and their goal is clear: different from the inner hashes. They will then be called
as they contain the metadata of every folder and file outer hashes. Fig. 4 displays the content of a sample EWF
present in the cloud storage, their presence mimic the file, created as the result of the image of root folder in
acquisition of the remote file table structure. Creation Fig. 2 and opened for instance with AccessData FTK

Fig. 4. FTK imager’s view of Fig. 2 root folder.


C. Federici / Digital Investigation 11 (2014) 30–42 39

Imager utility (http://www.accessdata.com/support/ belonging to the data catalog of the District of Columbia
product-downloads) with which outer hashes can be (http://data.dc.gov/) were downloaded and unzipped,
verified. notably Crime Incidents from 2011 to 2013 and Purchase
Orders from 2008 to 2011. These were selected because they
Making a comparison to Fig. 2, it can be seen that: are easily editable to verify how CDI wields historical ver-
sions. We calculated an MD5 and a SHA1 hash for every file
 deleted items are prepended with an underscore; using HashCheck Windows shell extension version 2.1.11
 file3.txt has one version ‘file3_rev_0B_6aHERKHo_r- and created with this tool two separate hash lists, namely
U1BXRGdkZHRXRjdJZWwyck0xdGZETnNCNGtZPQ.txt’ data.dc.gov.md5 and data.dc.gov.sha1, to simulate the
named after it with its revision id suffix. ‘fil- presence of small sized text files. The 9 files collection
e3.txt_$cdi$_rev_metadata.txt’ contains revision infor- hosted in a folder named data.dc.gov is listed in Table 3. We
mation as received by the cloud storage platform; carried out all operations with a Windows 7 box on October
 the same applies to documento.txt, save that it was 14th 2013 UTC þ2, which coincides with the creation date of
renamed from doc.txt; all files (not showed), selecting a data connection ranging in
 modification dates of all downloaded files and folders the average from 200 to 300 Kbytes/sec to verify CDI
are retained. Creation timestamps, that is date and time response with low speed networks. However, such a
an item was uploaded to the cloud, are retained as well, connection already suffices for a positive user experience as
but are not showed by FTK Imager. A search for these navigation usually entails acceptable delays in opening
date and times can be made against the content of file folders.
‘$cdi$_metadata.txt’ which is not shown for the sake of We uploaded the folder data.dc.gov to Dropbox, Google
brevity; Drive and Microsoft Skydrive leveraging the associate
 volume is named ‘CDI’. desktop application. At a later time, we moved the file
data.dc.gov.md5 to a newly created subdirectory named
‘2013’ and then edited it to create a revision by removing all
6.2.2. Tests findings asterisks from the content, obtaining the new hash values of
We have devised a test plan organized in one hundred 5c87472d77ff352f20469baf4648918a and 296a78be66c2-
trials, half of which were accomplished to put under stress c727338abdbbd57606ce35f3b280 for MD5 and SHA1 algo-
the application and half to try out all functionalities with rithm respectively. We performed these operations also for
small collections. In the former scenario, all runs but two, Skydrive, notwithstanding the mentioned inability of
were successfully carried out to verify operation continuity remotely retrieving past versions and trashed items. Find-
beyond the hour of imaging activity when an access token is ings of imaging activity of root folder data.dc.gov, containing
silently renewed by CDI library (not applicable for Dropbox). 9 files and 1 subfolder were as follows:
We hashed every sample file before uploading it to a folder
in the cloud storage and repeated the same operation for its  Dropbox: The imaging process took 5 min and 50 s,
corresponding copy after each session. We detected no inclusive of remote folder size calculation, to create a
differences in content between original files and copies after 9587 KB .E01 file, discovering and downloading twelve
imaging multi gigabyte folders with thousands of files of files and 2 subfolders for a total amount of 72108148
many different kinds, like pictures, videos or documents, bytes. There are 3 files and 1 folder more than the
some weighting several hundred of mega bytes. The two Windows local folder (see Figs. 5 and 6). This is because
failed runs were caused by network issues and just required data.dc.gov.md5 is still accounted as a zero sized file in
a fresh restart. Error codes were displayed on the screen and the root and its revision stems from the uploading of file
recorded in the session log file. In future releases of the tool in Table 3. In subfolder ‘2013’ there is the last edited
we will consider the possibility to resume operations from copy plus its revision generated from the moving of file
the point of interruption. Functionality tests were all suc- data.dc.gov.md5. So in all there are two revisions plus a
cessful. For instance, in a session a few public documents zero sized file in excess.

Table 3
List of the sample collection used for testing.

Name Last mod Size MD5 SHA1


crime_incidents_2011_ 13/10/2013 8:40 5136 KB 7e1854bcb6ebe6267650b49e88c43860 867fba1fb14864aeb4826e4f20a5c5e4fe91704b
CSV.csv
crime_incidents_2012_ 13/10/2013 8:35 5323 KB 688047a9186a8254d3e01c374e47b3d8 55efdc1097451fb1ff332ab4fd248639760040f2
CSV.csv
crime_incidents_2013_ 13/10/2013 8:30 4092 KB afb6d9a2a2ecbca577586cf41284b775 75716055e36e1ddc5ba7ba92ce47f4c82f2230c9
CSV.csv
data.dc.gov.md5 14/10/2013 10:21 1 KB 590791fb8e75c6b213b4be36c0753cd0 25096eb28b031e2a9595d0251d716dbc7162518f
data.dc.gov.sha1 14/10/2013 10:21 1 KB 796b74ebc114b0486550f787d5fe93ed 3c2c784649f21af943230450ed70fea68ec6c421
pass_2008_plain.xml 31/12/2009 3:50 15581 KB 4666f0b18baf059a5f4acdcef0217e0d 8b0c6cd118665c7ed662ef0d0ae7bb37fcc867f3
pass_2009_plain.xml 31/12/2010 3:50 14531 KB 0669a81a8d6b6681dcfb444df3428ab5 aa09ee3ff3708d34559d30db7c65f350e38557dd
pass_2010_plain.xml 31/12/2011 3:50 13659 KB fe35c7d30ad6f2ebcfae50b3bbcc2cf8 ad5b29495376da30e52ba10b17294c984d97ebed
pass_2011_plain.xml 31/12/2012 3:50 12096 KB 07d3e48f971bd0f4bb40a7e7c32bbfa7 c2778fbaf497938efe730e8efb9a74dfc947a555
40 C. Federici / Digital Investigation 11 (2014) 30–42

Fig. 5. Dropbox’s content of data.dc.gov and revisions of file data.dc.gov.md5.

Explanation of the sequence may clarify further: downloading 72107684 bytes organized in ten files and 1
subfolder. Differently from Dropbox, Google Drive just
1. At 09:21:57 UTC (server time) data.dc.gov.md5 with keeps track of the revision of data.dc.gov.md5 in folder
modification time of 08:21:45 UTC is created in the ‘2013’ and so there is only one file more than the Win-
cloud storage root after client synchronization; dows local folder and no other subfolders where created.
2. At 09:52:23 UTC the file is moved to subfolder ‘2013’ and The lack of one revision explains why 464 bytes less than
a zero sized file is created. Dropbox’s storage where found. The imaging process
3. At 09:53:49 UTC (client time) the content is changed as thus created three more files: two to host metadata for
asterisks are removed in local copy. A new synchroni- directories (‘root’ and ‘2013’) plus one for revisions
zation forces the creation of a new remote object at metadata. The hash list produced after opening the .E01
09:54:12 (server time). file with FTK imager matched both the files in Table 3 and
the list produced by CDI. Browsing with CDI and FTK
The excess folder is a deleted one named "Nuova car- showed again that modification dates and times are kept
tella" (New folder) which is the original name assigned by after desktop upload.
Windows before renaming to ‘2013’. The imaging process  Microsoft Skydrive: Process took 6 min and 11 s to pro-
created five more files: three to host metadata of all di- duce a 9580 KB .E01 image. 72107220 bytes were
rectories (‘root’, ‘2013’ and ‘Nuova cartella’) plus two for discovered in 9 files and 1 subfolder, just like the Win-
revisions metadata of data.dc.gov.md5 in folder ‘root’ and dows local directory. The missing 464 bytes revision in
‘2013’. We exported a hash list of the imaged folder by subfolder ‘2013’ compared to Google Drive accounts for
opening the .E01 file with FTK imager. Fingerprints the lesser amount of bytes found. Two more files
matched both the files in Table 3 and the hash list produced $cdi$_metadata.txt were produced during imaging, one
by CDI (Table 4), where the file data.dc.gov.md5 in folder located in the root directory and the other in the ‘2013’
‘root’ is missing because is zero sized. Browsing with CDI subfolder. Again a perfect match of all hash lists
and FTK showed that modification timestamps were confirmed that there were no modifications in file con-
retained after desktop client upload. tents. This is also true for modification timestamps which
were retained, provided that it is used an apparently
 Google Drive: Similar considerations can be made for undocumented parameter named "client_updated_time"
Google Drive. The imaging process took 5 min and 41 s to in the JSON formatted answer of Microsoft servers.
terminate, producing a 9586 KB .E01 file, discovering and Otherwise, the previously considered "updated_time"

Fig. 6. Dropbox’s content of data.dc.gov/2013 and revisions of file data.dc.gov.md5.


C. Federici / Digital Investigation 11 (2014) 30–42 41

Table 4
File hash list.txt produced by the imaging process.

Name MD5 SHA1


\$cdi$_metadata.txt 9fa2fcbd0e3e3ab495c865a733b752d0 78f0e59a04581ebbbe4988a46ce8dd54d2bad51f
\2013\$cdi$_metadata.txt 716398474c5d10b2559e1394bd55f56d 7eee567da6ade14bc3cc8c3abff92cefd74521b1
\2013\data.dc.gov.md5 5c87472d77ff352f20469baf4648918a 296a78be66c2c727338abdbbd57606ce35f3b280
\2013\data.dc.gov.md5_$cdi$_rev_metadata.txt bb80f6263af2371cc4570288870ca48c 1a8310d06ae378dc64e84814bedca91ee4fa5943
\2013\data.dc.gov_rev_1e14336529.md5 590791fb8e75c6b213b4be36c0753cd0 25096eb28b031e2a9595d0251d716dbc7162518f
\crime_incidents_2011_CSV.csv 7e1854bcb6ebe6267650b49e88c43860 867fba1fb14864aeb4826e4f20a5c5e4fe91704b
\crime_incidents_2012_CSV.csv 688047a9186a8254d3e01c374e47b3d8 55efdc1097451fb1ff332ab4fd248639760040f2
\crime_incidents_2013_CSV.csv afb6d9a2a2ecbca577586cf41284b775 75716055e36e1ddc5ba7ba92ce47f4c82f2230c9
data.dc.gov.md5_$cdi$_rev_metadata.txt dee6bd5169c68fe448c37efd67b55794 b0d5f3683a37a5ac082c502e23b266145f2708d2
\data.dc.gov_rev_1114336529.md5 590791fb8e75c6b213b4be36c0753cd0 25096eb28b031e2a9595d0251d716dbc7162518f
\data.dc.gov.sha1 796b74ebc114b0486550f787d5fe93ed 3c2c784649f21af943230450ed70fea68ec6c421
\_nuova cartella\$cdi$_metadata.txt 65d80a2397798bcbc68875f91531b8ee 597712bf2b714f7e29251c74f0ae26ded36a785b
\pass_2008_plain.xml 4666f0b18baf059a5f4acdcef0217e0d 8b0c6cd118665c7ed662ef0d0ae7bb37fcc867f3
\pass_2009_plain.xml 0669a81a8d6b6681dcfb444df3428ab5 aa09ee3ff3708d34559d30db7c65f350e38557dd
\pass_2010_plain.xml fe35c7d30ad6f2ebcfae50b3bbcc2cf8 ad5b29495376da30e52ba10b17294c984d97ebed
\pass_2011_plain.xml 07d3e48f971bd0f4bb40a7e7c32bbfa7 c2778fbaf497938efe730e8efb9a74dfc947a555

was current as it initially coincides with "created_time", will be performed in the future. Some very interesting
the moment the file was uploaded to the cloud platform development directions include:
(server time). This seems much like the "client_mtime"
(documented) key of Dropbox answers, whereas for  from a legal standpoint, the concept of repeatability of
Google Drive the parameter "modifiedDate" was technical assessments in a remote acquisition scenario
considered. would deserve to be revisited. Indeed, traditional forensic
activities may be deemed not repeatable because the very
action of powering a digital evidence may residually
7. Conclusions and future directions entail its permanent damage. However, cloud personal
storages are always on infrastructures with extreme
This research started from the consideration that, in an resiliency features and hazard of mechanical or electrical
investigation concerning crime related information hos- shocks do not apply. So it could be interesting to evaluate
ted in a cloud storage platform, it may not be possible to the implications to repeatability of remote acquisitions,
follow a traditional approach based on bit stream copying especially comparing different cloud technologies;
of seized mass memory or rely on cloud provider deliv-  the reliability of the network connection may have
ered data, if a sound ‘Forensics as a service’ is not in place. heavy impacts on CDI’s behavior. Just to make an
As previous and related work showed, applications example, if the network fails while a file is being
devoted to remote data acquisition with forensically downloaded the entire process must be restarted.
sound architectures are not very widespread to date and Implementing provisions for resuming a download from
general purpose tools are used which lack of fundamental the point of interruption would certainly contribute to
features such as read only access or precise audit trails. increase the robustness of the application;
This opens a broad avenue for the exploration of appli-  based on the concept of class interface, CDI Library is
cation program interfaces exposed by personal storage easily extendable to handle many other storage pro-
facilities. This work assessed that, when these interfaces viders that expose their platforms via HTTP services.
are accessed at the lowest possible level of web services, Amazon Simple Storage services and Openstack Swift
they are amenable for building valuable forensic tools will be first to be evaluated.
because of their ability to retrieve existing and trashed
files or their past revisions. In this respect, providers are
encouraged to empower the capabilities of their pro- Acknowledgments
gramming endpoints by offering more functionalities such
as the retrieval of access logs with the ip address of the The author wishes to thank the Italian Arma dei Cara-
user workstation and login times. We developed a library binieri for its valuable support in producing this paper.
which handles write protected access to selected remote
folders and masks to overlying applications all the dif-
ferences existing in several cloud technologies. We also References
built a prototypal application, namely Cloud Data Imager,
Chung H, Park J, Lee S, Kang C. Digital forensic investigation of cloud
which leverages the library to safely browse a remote
storage services. Digit Investig 2012, 11;9(2):81–95. http://dx.doi.org/
account and perform a logical copy of all retrievable ob- 10.1016/j.diin.2012.05.015.
jects and their metadata in a raw NTFS volume exported Dropbox. Core API – endpoint reference – Dropbox [Retrieved 09 2013,
to an expert witness container. The first evidences based from], https://www.dropbox.com/developers/core/docs; 2013a.
Dropbox. Dropbox – what happens to my old and deleted file versions?
on stress and functionality tests confirm that CDI faithfully [Retrieved 01 2014, from], https://www.dropbox.com/help/115/en;
traverses a selected remote directory and more test beds 2013b.
42 C. Federici / Digital Investigation 11 (2014) 30–42

Dykstra J, Sherman AT. Acquiring forensic evidence from infrastructure- Microsoft. Specifications for the.VHD format for Virtual Hard Disks
as-a-service cloud computing: exploring and evaluating tools, trust, [Retrieved 10 2013, from], http://www.microsoft.com/en-us/
and techniques. Digit Investig 2012, 08;9(Suppl.):S90–8. http:// download/details.aspx?id¼23850; 2013.
dx.doi.org/10.1016/j.diin.2012.05.001. Newton-King J. Json.NET [Retrieved from], http://james.newtonking.com/
Effortless.Net Encryption [Retrieved 10 2013, from], https:// json; 2013.
effortlessencryption.codeplex.com/; 2012. Openstack. Chapter 5. Object storage – OpenStack cloud administrator guide
Google. API reference – Google drive SDK [Retrieved 09 2013, from], – Havana [Retrieved 01 2014 from], http://docs.openstack.org/admin-
https://developers.google.com/drive/v2/reference/; 2013a. guide-cloud/content/ch_admin-openstack-object-storage.html; 2013.
Google. Concepts and techniques – Google cloud storage – Google de- Parkhill D. The challenge of the computer utility. Addison-Wesley; 1966.
velopers [Retrieved 01 2014, from], https://developers.google.com/ Quick D, Choo K-KR. Dropbox analysis: data remnants on user machines.
storage/docs/concepts-techniques; 2013b. Digit Investig 2013a, 06;10(1):3–18. http://dx.doi.org/10.1016/
Google. About file versions – drive help [Retrieved 01 2014 from], https:// j.diin.2013.02.003.
support.google.com/drive/answer/2409045?hl¼en; 2014a. Quick D, Choo R. Digital droplets: Microsoft SkyDrive forensic data rem-
Google. About “Trash” – drive help [Retrieved 01 2014 from], https:// nants. Future Gener Comput Syst 2013b, 08;29(6):1378–94. http://
support.google.com/drive/answer/2375102?hl¼en; 2014b. dx.doi.org/10.1016/j.future.2013.02.001.
Hale JS. Amazon cloud drive forensic analysis. Digit Investig 2013, 10; Quick D, Choo K-KR. Forensic collection of cloud storage data: does the act of
10(3):259–65. http://dx.doi.org/10.1016/j.diin.2013.04.006. collection result in changes to the data or its metadata? Digit Investig
Hardt D. (Ed.). RFC 6749-The OAuth 2.0 Authorization Framework; 2012, 2013c, 10;10(3):266–77. http://dx.doi.org/10.1016/j.diin.2013.07.001.
09. [Retrieved 09 2013, from] tools.ietf.org: http://tools.ietf.org/html/ REST reference Live Connect; 2013. [Retrieved 09 2013, from] msdn.
rfc6749. microsoft.com: http://msdn.microsoft.com/en-us/library/live/hh2436
International Organization for Standardization. ISO/IEC 27037:2012 In- 48.aspx.
formation technology – Security techniques – guidelines for identi- Shahine O. New SkyDrive recycle bin available today and Excel surveys
fication, collection, acquisition and preservation of digital evidence. coming soon [Retrieved 01 2014 from], http://blogs.windows.com/
Geneva, Switzerland [Retrieved 10 2013 from], http://www.iso.org/ skydrive/b/skydrive/archive/2012/09/18/new-skydrive-recycle-bin-
iso/catalogue_detail?csnumber¼44381; 2012. available-today-and-excel-surveys-coming-soon.aspx; 2012.
Metz J. Libewf and tooling to access the Expert Witness Compression The Apache Software Foundation. Apache log4net [Retrieved 09 2013,
Format (EWF) [Retrieved 10 2013, from], http://code.google.com/p/ from], http://logging.apache.org/log4net/; 2013.
libewf/; 2013.

You might also like