Evidence, Securing Online Evidence

Electronic Evidence Collection &
Preservation for eDiscovery

Best Practices for the Electronic Records Management of Website, Social Media, Mobile Text,
and Enterprise Collaboration Content
CONTENTS
EDISCOVERY IN THE ENTERPRISE 4
WHY ONLINE RECORDKEEPING IS HARD 5
THE DEMANDS OF DIGITAL EVIDENCE 6
THE EDRM AND THE INFORMATION GOVERNANCE REFERENCE MODEL 7
CREATE 8
COLLECTION 8
MONITORING 9
RETAIN 10
LEGALIZING 10
INDEXING 10
ARCHIVING 11
MANAGE 11
ANALYSIS AND REPORTING 12
EXPORT AND INTEGRATION 12
DISCOVERY AND HOLD 13
DISPOSE 13
RECORDS RETENTION 13
LONG-TERM PRESERVATION 13
SOLUTIONS FOR COMPLIANT RECORDKEEPING 14
LIVE REPLAY 14
TRACK CHANGES AND DELETIONS 14
ADVANCED SEARCH 14
DATA EXPORT 14
DIGITAL SIGNATURES AND TIMESTAMPS 14
DATA LOSS PREVENTION AND MONITORING 15

CASE MANAGEMENT 15
RETENTION SCHEDULING 15
LONG-TERM PRESERVATION 15
LEGAL HOLD 15
LET’S CONNECT! 16
EDISCOVERY IN THE MODERN ENTERPRISE
While regulations are a major reason why enterprises need to keep records of online data, it is by no means the
only reason. eDiscovery is another crucial factor that companies need to consider.
Website, social media, and enterprise collaboration content is increasingly being used in litigation related to
employment, intellectual property, contract issues, defamation, insurance fraud, etc. Consequently, companies
need a way to collect and store this unstructured data in a way that makes it easy to search and process it during
eDiscovery and litigation preparation.
These days, around 80% of enterprise data is unstructured, and 70% of enterprises are unsure how to manage
and protect this data. When it comes to the complex processes of data collection and preparation, this inability to
adequately deal with unstructured data can result in extremely high litigation costs. The solution is to collect and
store data in a way that makes it easily digestible by modern eDiscovery platforms.
Electronic Evidence Collection for eDiscovery and Compliance

4
What Is Unstructured Data?
Unstructured data is information that doesn’t fit a predefined data model and isn’t organized in a
predefined manner. In other words, it’s data that can’t easily be fed into a spreadsheet. As this suggests,
unstructured data tends to be text-heavy, while still containing things like numbers and dates.
Examples can include anything from PDF contracts to Word documents, health records, and even the
unstructured text in the body of an email. Recently, however, online data (including metadata) from
websites, social media channels, enterprise collaboration content, and mobile text messages are
becoming important components of unstructured data, as it’s often needed for compliance and litigation
purposes.
THE CHALLENGES OF ELECTRONIC RECORDS COMPLIANCE

Despite the fact that organizations need to keep detailed records of electronic data for litigation and compliance,
many still fail to do this. Why? Well, modern electronic recordkeeping is incredibly complex, and companies are
struggling to understand exactly what’s required.
While keeping records of official emails and discreet electronic documents is one thing, capturing modern web
content is quite another. Enterprises are expected to maintain records of:
• Website Content (including password-secured pages)

• Blogs
• Message Boards
• Enterprise Collaboration Content
• Text Messages
Doing this isn’t easy since content is constantly evolving—every passing minute brings more comments, replies,
likes, and shares—and they all result in new electronic records.
WHY ONLINE RECORDKEEPING IS HARD

Here are some of the main reasons why organizations struggle to keep accurate records of online data.
Mix of Content
Message boards, forums, blogs, enterprise collaboration platforms, and social media platforms don’t necessarily
consist of one simple stream of content—they have timelines, pages, direct messages, images, videos, comments,
etc. This makes online content particularly prone to recordkeeping errors. It’s all too easy for a post to be edited
or a comment deleted before an accurate record is created.
5
Real-Time Activity
Thousands of comments, likes, and shares can happen in an hour, and with each new interaction a new record is
generated. In other words, a single post with lots of engagement can result in the creation of thousands of records
in a very short space of time. This neverending real-time activity poses a tremendous challenge, since a record
can be outdated almost the moment that it’s created.
Evolving Platforms
Since a manual process like screenshotting is labor-intensive, can lead to incomplete records, and is unlikely to
result in records that’ll stand up in court, many organizations resort to some form of recordkeeping that collects
social media data automatically. While this is a good approach, it’s worth keeping in mind that social media
platforms are always evolving, so whatever solution an organization opts for, it needs to be able to adapt to
platform changes. Otherwise, every platform change will result in lengthy downtimes and record gaps.
Integration Requirements
In order to ensure that social media content is always collected in real-time, that archives are of evidentiary quality,
and that any changes to a platform will not impact the ability to archive data, it's necessary to leverage platforms'
APIs. Gaining access to these APIs and building the necessary integrations isn't always easy, but it's undoubtedly
the best way to ensure accurate records.
THE DEMANDS OF DIGITAL EVIDENCE

Along with the complications of online data collection and archiving mentioned above, it’s also important to
discuss what is required in order for digital information to be admissible in court. An organization has to be able
to prove the integrity and authenticity of any record provided, which means showing that the data hasn’t been
tampered with and demonstrating that it was indeed captured at the date, time, and URL stated.
Digital Signatures and Metadata

To prove data authenticity and integrity, an electronic record has to have the following:
• A digital signature that complies with the eSign Act of 2000;

• A timestamp that shows the date and time that a record was collected;
• All associated metadata.
What is Metadata?
Metadata is hidden data typically not visible to a user, or only visible in a limited capacity. If you examine
the metadata associated with a social media post, for example, it contains:
• Client Metadata: Browser, operating system, IP, and user.

• Web Server/API Endpoint Metadata: URL, HTTP headers, type, date, and time of the request.
• Account Data: The account owner, bio, description, and location.
• Message Data: Author, message type, post date and time, versions, links, likes, comments, etc.
6
THE EDRM AND THE INFORMATION GOVERNANCE
REFERENCE MODEL
In order to help organizations better understand and manage the eDiscovery process, the Electronic Discovery
Reference Model (EDRM) was created in 2006.
The model shows the various steps typically involved in eDiscovery:
• Identify • Preserve
• Collect • Process
• Review • Analyze
• Produce • Present
But it does not only consider the steps of eDiscovery. On the left, the EDRM also attempts to address what’s
needed in order to properly manage electronically stored information (ESI) for eDiscovery through the Information
Governance Reference Model (IGRM).
What Is Information Governance?

Information Governance (IG), as it suggests, concerns the management of information within an
organization, specifically the balancing of access with security. An IG framework typically consists of
policies, structures, and tools that help companies to effectively manage ESI.
Although this model can be immensely useful in managing ESI, there are very specific information governance
considerations when it comes to online data like website and social media content. With this in mind, Pagefreezer
has expanded on the IGRM to provide enterprises with a comprehensive step-by-step guide to managing online
records. This model breaks the stages of the IGRM down into 10 distinct steps that look like this:
To understand how an information governance framework like the IGRM can be adapted and applied specifically
to online data, let’s zoom into the four stages.
CREATE
COLLECTION
Electronic recordkeeping starts with the collection of data from sources such as websites, blogs, social media
networks, and enterprise collaboration platforms. As mentioned, the collection of online content is complicated by
the inherent nature of the data — the mix of content, constantly-evolving platforms, and real-time activity.
Social Media and Enterprise Collaboration

In order to address these challenges, organizations should be leveraging a solution that has API integrations
with platforms like Facebook, Workplace, and Twitter. This ensures that data is collected in real-time, and that
all changes, deletions, and linked content are collected. Without an API integration that allows for real-time

8
collection, there’s a high likelihood that crucial changes and communications would be missed, and that
archives will consequently be incomplete. With API integration, there’s also the added benefit of being able to
archive content retroactively — as long as the data is still available on the original platform, it can be collected and
placed in an archive.
Websites and Blogs

When dealing with websites, data should be crawled on a regular basis to capture all additions, edits, and
deletions across a site. Depending on how often website content is updated, it would typically be crawled once
per day or once per week. Importantly, any solution that’s put in place should be capable of dealing with the latest
complex sites. It should be able to capture client-side generated web pages by Javascript/Ajax frameworks,
including Ajax-loaded content. It should also be capable of collecting multiple steps in web form flows, and
capture webpage content that is displayed after a user event (if a section on a webpage loads additional
content using Ajax after a user clicks).
MONITORING
The second component of the Capture stage is monitoring. Due to the real-time nature of social media networks
and enterprise collaboration platforms especially, it’s important for organizations to reduce risk by monitoring
content in real-time. It should be done for two reasons: (i) preventing data loss and (ii) ensuring compliant,
appropriate use of these platforms.
Data Loss Prevention

There’s always the risk that an employee (or a member of the public) will share sensitive, private information on
a social media channel or collaboration platform. To prevent this, organizations should have a system in place
that notifies administrators when this kind of information is posted. If, for example, a home address is posted on
Facebook, or a social security number is shared on Workplace, an alert should be sent to administrators to notify
them of the situation and allow them to take quick action.
What is Data Loss Prevention (DLP)?

Data Loss Prevention refers to tools and processes that aim to prevent sensitive information from being
leaked or accessed without proper authorization. Through a DLP process/strategy, information is classified
according to its level of sensitivity, and based on this, policies are then put in place to prevent improper
use and sharing of this confidential information. For instance, alerts might be sent out when this data
(a password, home address, social security number, etc.) is shared in an email or on a corporate chat
platform. In some cases, software can even prevent information from being typed into a social media or
enterprise collaboration platform entirely.
Policy Compliance
For both external social media channels like Facebook and Twitter and internal chat platforms like Workplace,
organizations should have a detailed policy in place that governs their use. Combined with this should be some
form of monitoring solution that allows the organization to be alerted when something is posted that does not
comply with the policy—if, for instance, someone makes a threat of physical violence or uses profanity on a
Facebook page.

9
Do You Have Social Media Policy in Place?
If you don’t currently have a social media policy, you can download our free guide and template by
clicking the link below:
Download Social Media Template
RETAIN
The second stage of the Pagefreezer framework is Maintenance. Crucial to this stage within the realm of online
content is the legalization, indexing, and archiving of data.
LEGALIZING
This process relates to the capturing of data in a way that will make it defensible in a court of law. As explained
earlier in this document, this means gathering associated metadata of all electronic records and furnishing them
with a timestamp and digital signature that proves data integrity and authenticity.
While collecting and storing online data is important, and any organization actively doing it deserves to be
congratulated, it’s important to do it in a way that results in records that would be admissible in a court of law. So
simple screenshots would not be adequate since they wouldn’t have the metadata and digital signatures needed
for litigation.
INDEXING
What differentiates an archive of electronic records from a basic back-up of data is the fact that properly archived
records are indexed, meaning that the content is compiled in a way that makes it easy to search. So when a
specific record needs to be found, all that’s required is a simple search and not a labor-intensive trawl through
thousands of files. Properly indexed data also maintains relationships between data and users (allowing for the
posts and comments of a specific user to easily be identified), and even allows metadata to be searched.

10
Archive Back-up
Full-text Search  
Digital Signatures  
Easy access to archives  
Live Replay  
Metadata  
Compliant data storage  
Accessible Instant, 24x7 Takes hours
Solution for Compliance, Legal IT
ARCHIVING
Once information has been captured, part of the maintenance process is placing that data in an archive. As stated
above, this isn’t simply a back-up of online data, but is instead a database that is indexed and fully searchable.
Of course, while an archive is not merely a back-up of data, it is important to create back-ups of the archive
itself. The data should ideally be replicated three times, saved to WORM (Write Once, Read Many) storage, and
backed up remotely in the event of a disaster.
Another crucial component to consider when it comes to the archiving of data is security. In order to show
compliance and successfully use data during litigation, the accuracy and integrity of the information should be
beyond question. This will only be the case if the data is being archived in a secure way. Enterprises should aim
to make use of an archiving vendor that is ISO 27001 and SOC 2 certified.
MANAGE

11
ANALYSIS AND REPORTING
Once online data has been archived, an opportunity exists to analyze that information and gain valuable
insights. From looking at the number of average daily interactions a social media account has to understanding
what posts and website campaigns perform best, a large archive of data makes it easier to take a big-picture view
of online activity. While analysis is not crucial to thorough electronic recordkeeping, not leveraging archived data
for useful insights is a missed opportunity.
EXPORT AND INTEGRATION

The last thing an organization should want when archiving data is to have it locked into proprietary software
that doesn’t allow for the easy export of information. PDF is one popular form of export that should be available,
but data should additionally be exportable in WARC format. It is also worth looking at the integrations offered
by any electronic records management solution. Being able to export data to an eDiscovery platform can be
immensely useful in streamlining workflows.
What is WARC?
Web ARChive (WARC) is a file format for the long-term

preservation of digital data. It stores web pages and
other digital resources including images and meta
information in their original source code.
WARC has been accepted as an ISO standard

(28500:2017), and since then, WARC has also been
adopted by many software vendors, libraries, and
agencies across the globe as the new standard for
digital records archiving, specifically for web pages
and full websites.
The U.S. Government has also embraced this

standard. The National Archives and the Library of
Congress adopted WARC as the only acceptable file
format for the long-term preservation of website and
social media records according to Bulletin 2014-04,
“Format Guidance for the Transfer of Permanent
Electronic Records.”

12
DISCOVERY AND HOLD
Speaking of eDiscovery exports and integrations, it’s important that online data like website and social media
content be easily searchable, exportable, and processable for legal purposes.
The ability to place a legal hold is another important consideration. Data doesn’t stay in an archive forever.
Organizations can be expected to retain official records for anything from three to 10 years, and once that
retention period is reached, information is typically deleted. However, if the data is needed for legal purposes,
this should be overridden to ensure that evidence isn’t lost. An archive solution should, therefore, enable the
organization to easily place a post or comment on legal hold to preserve it for litigation.
DISPOSE
RECORDS RETENTION
As touched on in the previous section, data doesn’t remain in the archive permanently. All archived content has
a disposition status, and unless something is on legal hold, that status is usually temporary. So as soon as it
falls outside the period during which an organization is obligated to keep the information, the data may safely
be deleted. Ideally, this process should be automated to ensure that data is never being kept if it’s not needed,
while also reducing the workload that would come with manually deleting content on a daily basis. Lastly, an
organization should make sure that any archiving vendor being considered offers a grace period with regards to
deleted content, just in case deleted data needs to be recovered.
LONG-TERM PRESERVATION
It is common for information in large enterprises to be preserved long-term—timelines that can stretch to 100
years, or even beyond that. This means that once the data is removed from an organization’s archive, it is moved
to a central repository where it can be preserved long-term. When the information is transferred in this way, it
needs to be done in WARC format. So once again, it’s important that archive data be exportable in WARC.

13
SOLUTIONS FOR COMPLIANT RECORDKEEPING
To assist enterprises in collecting data for compliance and eDiscovery, Pagefreezer offers a suite of products
that simplify and automate the creation, retention, management, and disposal of online data. Below are some
enterprise solution highlights.
LIVE REPLAY ADVANCED SEARCH

Archived content is presented in the original look and Pagefreezer comes with a powerful full-text search
feel. Next to each social media message, for instance, engine that allows users to easily find specific
the interface displays the metadata for that message archived pages, messages, and social media
and the history of all changes. Pagefreezer displays posts. This makes eDiscovery and general content
all message types, images, comments, and replies to collection much easier, ultimately saving time and
comments in the same way as they appeared on the money. Users can search by keywords, phrases,
original social media platform. boolean operators, social media networks, accounts,
and date ranges.
DATA EXPORT
Archived content can be exported in PDF or WARC
through the Pagefreezer dashboard. Specific social
media accounts, selections of messages, open
records cases, or even a complete account archive
can be exported. The exports include all selected
messages and conversation threads, as well as
associated metadata.
TRACK CHANGES AND DELETIONS

As pages, posts, and messages can be changed and
have multiple versions over time, Pagefreezer has a
user-friendly way to access different versions. Every
message or comment that has multiple versions are
indicated with a blue icon showing the number of
DIGITAL SIGNATURES AND TIMESTAMPS
versions. Deleted content is highlighted in red, with
For digital records to be accepted in court, you must
deletion date and time clearly shown. Changes/
be able to prove their authenticity and integrity.
additions are shown in green.
Pagefreezer meets the standards for digital evidence
and facilitates the legal hold process by stamping
each archived page with an RFC 3136 compliant
TimeStamp Authority and a SHA-256 digital signature.

14
DATA LOSS PREVENTION AND MONITORING Should it become necessary, removed records can
To ensure that activity on social media accounts also be recovered within 30 days. To ensure that
complies with the organization’s social media policies, organizations have complete oversight of all user
Pagefreezer lets you actively monitor conversations management activity, data viewed, exported and
on your social media channels or enterprise disposed of, Pagefreezer audit logs provide detailed
channels based on a customized a list of keywords, information of all activities related to archives,
pre-defined text and number patterns, profanity, or including destruction activities.
custom text patterns you want to keep an eye on.
LONG-TERM PRESERVATION
CASE MANAGEMENT Comprehensive archiving of your websites is of vital
In the Pagefreezer dashboard, users can create importance in case of investigation or litigation. We
‘cases’ in which they can collect relevant records. understand that you are trusting Pagefreezer to
While reviewing archive records or searching, handle your archives responsibly. That’s why we store
individual posts and messages can be added to a your archives in a fault-tolerant Type II, SOC 1 and
case. Once all records have been selected, the case SOC 2 certified data center in Seattle, Washington
can be printed or exported to a file that includes with multiple secure data nodes. Pagefreezer also
relevant messages, conversation threads, and provides customers with the option for long-term
associated metadata. Data can also be ingested immutable storage (WORM format) of archived data.
by eDiscovery platforms for further processing and
preparation. LEGAL HOLD
Any post, comment, or reply can be flagged and
RETENTION SCHEDULING placed on legal hold, overriding the retention
Pagefreezer offers retention scheduling to automate schedule to ensure records remain available. To
the disposal of data and simplify alignment with support your team with legal holds, users can flag
your organization’s record retention policies. social media records that are relevant and add them
to a case. Cases can then be exported with the same
look and feel as the original social media network,
simplifying use during legal proceedings.

15
LET’S CONNECT!
We really enjoy speaking with companies about their use-cases and how we can improve our solutions to better
suit their needs. Many of our features are the results of customer requests. We’re looking forward to not only
working with you as a customer, but to also hear your ideas on how we can make our products better.
Why Choose Pagefreezer?
• We’re proven and trusted by over 1,700 customers in a wide range of industries including finance, legal,
telecom, retail, utilities, government, and post-secondary education.
• We’re results focused — your success is our success. It’s our job to make your life easier. Up and running in
minutes with a Customer Success team supporting you every step of the journey.
• We offer a comprehensive solution — we provide solutions for all your archiving needs: website, social
media, corporate chat, and SMS/text messages.
• We’re affordable — we are reasonably priced and there are no hidden fees.

16
Disclaimer
This document was created to provide information
about a specific issue. This document does not
take a position on any specific course of action
or proposal, nor is it intended to endorse any
particular vendor or product. Every effort has been
made to present accurate and reliable information;
however, Pagefreezer assumes no responsibility
for consequences resulting from the use of the
information herein.
Copyright © Pagefreezer Software Inc., 2019.

This document, and any portion thereof, may not
be quoted, reproduced, copied, disseminated or
otherwise distributed without the express written
permission of Pagefreezer Software Inc..
Pagefreezer Software, Inc.

500 - 311 Water Street Vancouver
BC V6B 1B8 Canada
Phone: +1 888 916 3999

www.Pagefreezer.com
info@Pagefreezer.com

17

Evidence, Securing Online Evidence

Uploaded by

Copyright:

Available Formats

You might also like

Evidence, Securing Online Evidence

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evidence, Securing Online Evidence

Uploaded by

Copyright:

Available Formats

Electronic Evidence Collection &

Preservation for eDiscovery

WHY ONLINE RECORDKEEPING IS HARD 5

THE DEMANDS OF DIGITAL EVIDENCE 6

THE EDRM AND THE INFORMATION GOVERNANCE REFERENCE MODEL 7

ANALYSIS AND REPORTING 12

EXPORT AND INTEGRATION 12

DISCOVERY AND HOLD 13

SOLUTIONS FOR COMPLIANT RECORDKEEPING 14

TRACK CHANGES AND DELETIONS 14

DIGITAL SIGNATURES AND TIMESTAMPS 14

DATA LOSS PREVENTION AND MONITORING 15

Electronic Evidence Collection for eDiscovery and Compliance

THE CHALLENGES OF ELECTRONIC RECORDS COMPLIANCE

• Website Content (including password-secured pages)

WHY ONLINE RECORDKEEPING IS HARD

THE DEMANDS OF DIGITAL EVIDENCE

Digital Signatures and Metadata

• A digital signature that complies with the eSign Act of 2000;

• Client Metadata: Browser, operating system, IP, and user.

The model shows the various steps typically involved in eDiscovery:

What Is Information Governance?

Social Media and Enterprise Collaboration

Electronic Evidence Collection for eDiscovery and Compliance

Websites and Blogs

Data Loss Prevention

What is Data Loss Prevention (DLP)?

Electronic Evidence Collection for eDiscovery and Compliance

Electronic Evidence Collection for eDiscovery and Compliance

Electronic Evidence Collection for eDiscovery and Compliance

EXPORT AND INTEGRATION

Web ARChive (WARC) is a file format for the long-term

WARC has been accepted as an ISO standard

The U.S. Government has also embraced this

Electronic Evidence Collection for eDiscovery and Compliance

Electronic Evidence Collection for eDiscovery and Compliance

LIVE REPLAY ADVANCED SEARCH

TRACK CHANGES AND DELETIONS

Electronic Evidence Collection for eDiscovery and Compliance

Electronic Evidence Collection for eDiscovery and Compliance

Why Choose Pagefreezer?

Electronic Evidence Collection for eDiscovery and Compliance

Copyright © Pagefreezer Software Inc., 2019.

Pagefreezer Software, Inc.

Phone: +1 888 916 3999

Electronic Evidence Collection for eDiscovery and Compliance

You might also like