Chapter 04 - Cloud Data Security

CPIS 622
Cloud Computing Security
1
Chapter 4:
Cloud Data Security
2
Outline
• Cloud Data Lifecycle
• Cloud Storage Architectures
• Volume Storage: File-Based Storage and Block Storage
• Object-Based Storage
• Databases
• Content Delivery Network (CDN)
• Cloud Data Security Foundational Strategies
• Encryption
• Key Management
• Masking, Obfuscation, Anonymization, and Tokenization
• Security Information and Event Management
• Egress Monitoring (DLP)
3
Cloud Data Lifecycle
• Data in the cloud have the same needs and properties as data in the legacy
environment.
• In cloud environment
• Data will still be created (Create phase)—both in the cloud itself and by remote
users.
• Data will be stored, in both the short term (Store phase) and long term (Archive
phase)
• Data will be manipulated and modified (Use phase) in the production
environment
• It will be transmitted to other users and made available for collaboration (Share
phase)
• Data need to be removed from the production environment and sanitize the
media afterward (Destroy phase).
• In the cloud, each phase of the data lifecycle will require particular
protections.
4
Create
• Data will most often be created by users accessing the cloud remotely.
• Data Created Remotely
• Data created by the user should be encrypted before uploading to cloud.
=> To protect against obvious vulnerabilities, including man-in-the-middle
attacks and insider threat at the cloud data center.
• Cryptosystem used for this purpose should:
• have a high work factor and
• be listed on the FIPS 140-2 list of approved crypto solutions.
• Also implement good key management practices,
• connection used to upload the data should also be secure, preferably with an IPsec or
TLS (1.2 or higher) VPN solution.
5
Cont.
• Data Created within the Cloud
• Data created within the cloud via remote manipulation should be encrypted
upon creation
• to obviate unnecessary access or viewing by data center personnel.
• key management should be performed according to best industry practices
6
Store
• Store phase takes place right after the Create phase and before the Use
and Share phases
• This means Store is usually meant to refer to near-term storage
• Activity in the Store phase to occur almost concurrently with the Create
phase
• That is Store will happen as data is created.
• Security
• Encryption at rest for mitigating exposure to threats within the cloud service
provider and
• encryption in transit for mitigating exposure to threats while being moved to
the cloud data center.
7
Use
• Operations in the cloud environment are performed by remote access, so
those connections must be secured, e.g., encrypted tunnel.
• Platforms with which users connect to the cloud have to also be secured
• BYOD environment: we can never be sure just what devices the users have
• Users must be trained to understand the new risks that go along with
cloud computing and expected to use technology like VPN, IRM
• Data owners should be careful to restrict permissions for modifying and
processing their data
• Limit the user to those functions that they absolutely require to perform
assigned tasks.
8
Share
• Cloud customers should consider implementing some form of egress
monitoring in the Share
• Share is good e.g., for global collaboration but it comes with risk
• Security controls implemented in previous phases are useful here:
• encrypted files and communications, IRM (Information Rights Management )
solutions, and so forth.
• Sharing restrictions may be based on jurisdiction;
• E.g., limit or prevent data being sent to certain locations
• These restrictions can export controls or import controls
9
Archive
• It is the phase for long-term storage
• Security controls for the data should be planned
• Cryptography is an essential consideration for archive
• Key management is of utmost importance,
• mismanaged keys can lead to exposure or to total loss of the data.
• If the keys are improperly stored (e.g., if stored with data), there is an
increased risk of loss
• if keys are stored away from the data but not managed properly and lost,
there will be no efficient means to recover the data.
• Physical security of data in long-term storage is also important.
• In choosing a storage location, we need to weigh risks and benefits for
physical security:
10
Cont.
• Location
• Where is the data being stored?
• What environmental factors will pose risks in that location (natural disasters, climate,
etc.)?
• What jurisdictional aspects might bear consideration (local and national laws)?
• How distant is the archive location?
• Will it be feasible to access the data during contingency operations
• Format
• Is the data being stored on some physical medium such as tape backup or magnetic
storage?
• Is the media highly portable and in need of additional security controls against theft?
• Will that medium be affected by environmental factors?
• How long do we expect to retain this data?
• Will it be in a format still accessible by production hardware when we need it?
11
Cont.
• Staff
• Are personnel at the storage location employed by our organization?
• If not, does the contractor implement a personnel control suite sufficient for
our purposes?
• Procedure
• How is data recovered when needed?
• How is it ported to the archive on a regular basis?
• How often are we doing full backups (and the frequency of incremental or
differential backups)?
12
Archive on Cloud
• Activities in the cloud will largely be driven by
• whether we are doing backups in the cloud,
• whether we are using the same cloud provider for backups and our
production environment, or
• whether we are using a different cloud provider
• Consider all the same factors we would use in the traditional
environment
• also determine whether we could impose those same decisions in the cloud
environment, on the cloud provider, via contractual means
13
Destroy
• Destruction options for the traditional and cloud environments in Chapter 3.

• cryptographic erasure (cryptoshredding) is the only feasible means for this
purpose in the cloud environment.
14
Cloud Storage Architectures
• There are various ways to store data in the cloud, each with attendant
benefits and costs.
• These ways apply both
• to larger organizational needs and
• to personal cloud storage of a single user’s data.
• These ways are
1. Volume Storage: File-Based Storage and Block Storage
2. Object-Based Storage
3. Databases
4. Content Delivery Network (CDN)
15
1. Volume Storage: File-Based Storage
and Block Storage
• Customer is allocated a storage space within the cloud;
• this storage space is represented as an attached drive to user’s VM,
actual locations and memory addresses are transparent to the user
• performs very much in the same manner as physical drive
• means of implementing data protection solutions in the cloud
• Storage architecture include bit splitting and erasure coding,
• Volume storage is often associated with infrastructure as a service
(IaaS).
16
1. Volume Storage: File-Based Storage
and Block Storage
File Storage
Data is stored and displayed just as with a file structure in the traditional environment,
as files and folders
This architectures become popular with big data analytical tools and processes.
17
Cont.
• Block Storage
• File storage has a hierarchy of folders and files, but block storage is a blank
volume that the customer or user can put anything into.
• Block storage might allow more flexibility and higher performance
• it requires a greater amount of administration
• might entail installation of an OS
• Block storage is better if data is of multiple types and kinds such as enterprise
backup services.
18
Block Storage
19
• Data is stored as objects, not as files or blocks.
• Objects include
• actual production content,
• metadata describing the content and object
• a unique address identifier for locating that specific object across an entire storage
space.
• This architectures allow for a significant level of description, including
• marking,
• labels,
• classification, and
• categorization.
• Object storage is usually associated with IaaS 20
21
3. Databases
• Databases in the cloud provide some sort of structure for stored data.
• Data will be arranged according to characteristics and elements in the
data itself
• In the cloud, the database is usually backend storage in the data
center, accessed by users utilizing online apps or APIs through a
browser.
• Databases can be implemented in any cloud service model, but most
often configured to work with PaaS and SaaS
22
4. Content Delivery Network (CDN)
• It is a form of data caching, usually near geophysical locations of high
use/demand, for copies of data commonly requested by users.
• For example, online multimedia streaming services:
• instead of dragging data from a data center to users at variable distances
• the streaming service provider can place copies of the most requested media
near those requests are likely to be made
• Thus improving bandwidth and delivery quality
23
4. Content Delivery Network
(CDN)
24
Cloud Data Security Foundational
Strategies
• Certain technologies and practices make data security possible in the
cloud
1. Encryption
2. Masking, Obfuscation, Anonymization, and Tokenization
3. Security Information and Event Management
4. Egress Monitoring (DLP)
25
1. Encryption
• It is an art of converting data into form that is not readable for non intended entity
• Cloud computing has a massive dependency on encryption
• Encryption will be used to protect data at rest, in transit, and in use.
• Encryption will be used
• on the remote user endpoint to create the secure communication connection,
• within cloud customer’s enterprise environment to protect their own data,
• within the data
• center by the cloud provider to ensure various cloud customers don’t accidentally access each other’s data
• Without encryption it would be impossible to use the cloud in any secure fashion
• Two particular topics of encryption in the cloud: key management and an experimental
encryption implementation
26
Cont.
• Key Management
• How and where encryption keys are stored can affect the overall risk of the
data significantly
• Consider following points regarding key management for cloud computing:
• Encryption keys, which are the mathematical numeric stringcode that allows
for encryption and decryption to occur
• Level of Protection
• Encryption keys, must be secured at the same level of control, or higher ,
as the data they protect
• Sensitivity of the data (organization’s data security policies) dictates this
level of protection
• strength of the cryptosystem is only valid if keys are not disclosed (except
for public keys) 27
Cont.
• E.g., A hardware security module (HSM) is a device that
• can safely store and manage encryption keys and is used in servers, data
transmission, and log files.
• If implemented properly, it is far stronger than saving and storing keys in
software.
• Key Recovery
• For anyone other than a specific user, accessing that user’s key should be
difficult
• Organization may need to acquire a user’s key without the user’s
cooperation
• If a user is fired from the organization, or died,
28
Cont.
• Key Distribution
• Issuing keys for a cryptosystem can be difficult and involve with risk.
• If key management process requires a secure connection to initiate the key
creation procedure, how to establish that secure session without a key
• passing keys out of band is a preferable, yet cumbersome and expensive,
solution.
• keys should never be passed in the clear.
• Key Revocation
• the organization needs a process for suspending the key or that user’s
ability to use it.
• When user should no longer have access to sensitive material or
• When a key has been disclosed
29
Cont.
• Key Escrow
• It is the process in which copies of keys are held by a trusted third party in
a secure environment
• Outsourcing Key Management
• Keys should not be stored with the data they’re protecting
• In cloud computing, it is preferable to have the keys stored somewhere
other than the cloud provider’s data center. There are two solutions
• Cloud customer to retain the keys, but that requires an expensive and
complicated set of infrastructure and skilled personnel.
• Use a cloud access security broker (CASB). It is third-party that handle
• IAM and key management services for cloud customers
30
2. Masking, Obfuscation, Anonymization,
and Tokenization
• In cloud it necessary to obscure actual data and instead use a
representation of that data, in certain cases the terms masking ,
obfuscation , anonymization , and tokenization refer to methods to
accomplish this.
• Examples
• Test Environments:
• New software should be tested in sandboxed environments before being
deployed to the production environment.
• In this type of testing, actual production data should never be used within the
sandbox.
• But to determine the functionality of system use data that closely approximates
the same traits and characteristics of the production data.
31
Cont.
• Enforcing Least Privilege
• least privilege means limiting users to permissions and access necessary to
perform their duties.
• It also means allowing the user access to elements of a data set without
revealing its
• entirety.
• E.g., a customer service representative might need to access a customer’s
account information, but that data might be shortened version of the customer’s
total information (e.g., without credit card )
• Secure Remote Access
• When a customer logs onto a web service, the customer’s account might have
some data shortened in similar fashion to the least privilege example.
• You might not want to display certain elements of the customer’s account data,
such as payment or personal information,
• to avoid risks such as hijacked sessions, stolen credentials, or shoulder surfing.
32
Techniques to obscure data in the cloud
context
• Randomization
• It is replacement of the data (or part of the data) with random characters.
• Other than actual data, length of the string, character set can be obscure
• Hashing
• Using a one-way cryptographic function to create a digest of the original data.
• Using a hash algorithm to obscure the data gives the benefit of ensuring it is
unrecoverable and can also use as an integrity check.
• But hashing converts variable-length messages into fixed-length digests, you
lose many of the properties of the original data.
33
Cont.
• Shuffling
• It is like the substitution, but it derives the substitution set from the same of
data that is being masked.
• Using different entries from within the same data set to represent the data.
This has the obvious drawback of using actual production data.
• Masking
• Hiding the data with useless characters; for example, showing only the last
four digits of a Social Security number: XXX-XX-1234.
• This can be used where the customer service representative or the customer
gets authorized access to the account but you want to obscure a portion of
the data for additional security.
34
Cont.
• Nulls
• Deleting the raw data from the display before it is represented or displaying
null sets.
• Obviously, some of the functionality of the data set will be dramatically
reduced with this method.
• Obfuscation refers to the application of any of these techniques in

order to make the data less meaningful, detailed, or readable in order
to protect the data or the subject of the data.
35
Obscuring configuration
• Static configuration
• a new (representational) data set is created as a copy from the original data,
and only the obscured copy is used.
• Dynamic configurations
• Data is obscured as it is called,
• Example: the customer service agent or the customer is granted authorized
access, but the data is obscured as it is fed to them.
36
Cont.
• Sometimes add another layer of abstraction to the data may reduce
the possibility that sensitive information may be gathered
• E.g., even if we’re obscuring a person’s name in a given data set, but display
other information, such as age, location, and employer, it may be possible to
determine the name without having direct access to that field.
• Removing the telltale nonspecific identifiers is called anonymization
or sometimes deidentification.
• Anonymization can be difficult, because sensitive data must be recognized
and marked as sensitive when it is created; if the user inputs the data into
open fields (free entry), determining sensitivity might not be simple.
37
Tokenization
• It is the practice of having two distinct databases:
• one with the live, actual sensitive data and
• other with nonrepresentational tokens mapped to each piece of that data.
• In this method,
• user or program calling the data is authenticated by the token server,
• which pulls the appropriate token from the token database,
• then calls the actual data that maps to that token from the real database of
production data, and
• finally presents it to the user or program.
38
Cont.
• Tokenization adds significant overhead to the process but
• creates an extra degree of security and
• may relieve the organization’s requirement or dependence on encryption
• tokenization to function properly, the token server must have strong
authentication protocols.
39
How tokenization works
1. A user creates a piece of data.
2. The data is run through a DLP/discovery tool, to determine whether
the data is sensitive according to the organization’s rules (in this
example, the data is PII).
• If the data is deemed sensitive, the data is pushed to the tokenization
database.
3. The data is tokenized;
• the raw data is sent to the PII server,
• while a token representing the data is stored in the tokenization database.
The token represents the raw data as a kind of logical address.
40
Cont.
4. Another authenticated user requests the data.

5. If the user authenticates correctly, the request is put to the
tokenization database.
6. The tokenization database looks up the token of the requested data,
then presents that token to the PII database. Raw data is not stored
in the tokenization database.
7. The PII database returns the raw data based on the token.
8. The raw data is delivered to the requesting user.
41
Basic tokenization architecture
42
3. Security Information and Event
Management
• These are monitoring tools that tell how well the systems and security
controls in our IT environment are functioning, to detect anomalous
activity, and to enforce policy.
• Mostly monitoring is performed from logs.
• recording activity as it happens,
• To better collect, manage, analyze, and display log data, a set of tools
specifically created for that purpose has become popular.
• Terminology for those tools are
• security information management, security event management, security
information and event management
• SIEM (Security Information and Event Management) 43
3. Security Information and Event
Management
44
Goals of SIEM
• Centralize Collection of Log Data
• Placing the log collected from various sources (workstations, OSs, servers,
network devices, and so on) in one place for
• Additional processing
• Simplified the activities for admin and analyst
• All the log data in one location makes that location an attractive target for
attackers
• SIEM implementation will require additional layers of security controls
• Dashboarding
• SIEMs offer graphical output display that is more intuitive and simpler for
managers to quickly grasp situations within the environment.
• Because Management often doesn’t understand IT functions
45
Cont.
• Enhanced Analysis Capabilities
• Some part of log analysis should be automated
• Since Log analysis is everyday, repetitive task that requires a special skillset and experience
and is not suited for full-time tasking
• SIEM tools should have this capability
• Most automated tools will not recognize a particular set of attacks—the “low and
slow” style of persistent threats,
• which may develop over weeks or months and don’t have dramatic indicators and
• go undetected by automated analysis.
• Automated Response
• Some SIEMs include automated alert and response capabilities that can be
programmed to suit your policies and environment.
46
4. Egress Monitoring (DLP)
• Egress monitoring—examining data as it leaves the production
environment.
• DLP stand for any combination of the terms data loss , leak
prevention, and protection
• DLP tools concept is that data is identifyed, activity is monitored, and
policies are enforced.
• SIEM, DLP solutions has following major goals
1. Additional Security DLP can be used as another control in the layered
defense strategy, one last mechanism designed for mitigating the possibility
of malicious disclosure.
47
Cont.
2. Policy Enforcement Users can be alerted by the DLP when they are
attempting to perform an action that would violate the
organization’s policy (either accidentally or intentionally).
3. Enhanced Monitoring The DLP tool can be set to provide one more
log stream to the organization’s monitoring suite.
4. Regulatory Compliance Specific types and kinds of data can be
identified by the DLP solution, and dissemination of that data can
be controlled accordingly
48
Amazon Web Services used by Netflix
Service Type Name of Amazon Web Service
Hosting EC2
Storage S3
Content Delivery Network Cloudfront
Database (essential prerequisite for data analytics) Relational Database Service (RDS), DynamoDB
Events driven programming Lambda
49
Summary
• Addressed the data lifecycle within the cloud environment as well as specific
security challenges in each phase.
• Looked at different data storage architectures that might be implemented in the
cloud, and which service model might be best suited for each.
• Discussed cryptography, including the importance of and difficulties with key
management
• Discussed why we might want to obscure raw data and only display selected
portions during operations, and we talked about various methods for performing
this task.
• Reviewed SIEM solutions, how and why they’re implemented, and some risks
associated with their use.
• Addressed the topic of egress monitoring, how DLP tools work, and specific
problems that might be encountered when trying to deploy DLP solutions in the
cloud. 50

Chapter 04 - Cloud Data Security

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 04 - Cloud Data Security

Uploaded by

Copyright:

Available Formats

CPIS 622

Cloud Computing Security

• Destruction options for the traditional and cloud environments in Chapter 3.

• Obfuscation refers to the application of any of these techniques in

4. Another authenticated user requests the data.

Service Type Name of Amazon Web Service

Content Delivery Network Cloudfront

Events driven programming Lambda

You might also like