Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

DATA

INTEGRITY
ISSUES
Introduction
Data integrity is the maintenance of,
and the assurance of the accuracy and
consistency of, data over its entire life-
cycle.

Aims to prevent unintentional


changes to information.
Types of
Integrity
PHYSICAL INTEGRITY
associated with correctly storing and
fetching the data itself

LOGICAL INTEGRITY
Correctness or rationality of a
piece of data, given a particular
context.
Challenges to Physical
Integrity
Electromechanical Faults
Design Flaws
Material Flaw
Power Outages
Corrosion
Extreme temperatures
Extreme Pressures
Challenges to Logical
Integrity

Software bugs
Design flaws
Human errors
Types of Integrity
Constraints
Entity Integrity: every table must have a primary key and that the
column or columns chosen to be the primary key should be unique and
not null.
Referential integrity: whenever a foreign key value is used it must
reference a valid, existing primary key in the parent table.
Domain integrity: all columns in a relational database must be declared upon
a defined domain.
User-defined constraints
Basic algorithm for
checking data integrity in
cloud services

1. Key Generation:
Client is provided with a public and private key
Public key is given to a trusted verifier
Secret key stays with the client
Public key is formed by two integers e and n
N is formed by p*q where p and q are 2 large
primes e is chosen such that gcd(e,N)=1
Secret key is a tuple formed by (d,n) where d is inverse of e
2. Tag Creation
compresses the original data file and generates the cipher
text corresponding to all blocks of the file.
Dm = miE mod N, for i – 1 to n, where n is number of blocks

3. Challenge Generation
The owner or verifier imposes a challenge on the server as to how
many number of blocks are to be tested.

4. Proof Generation
The server generates the proof by first creating the tags of the currently
stored data on it and then calculating the response R by applying blockwise
modulus operations considering E and n.
5. Verify Proof
After getting response R from the server computes R’
R and R’ is then compared which according to homomorphic property
must match if data is correctly stored on the server.
Data Integrity Issues
After storing data to the cloud, user depends on the cloud to provide more reliable
services to them and hopes that their data and applications are in secured manner.
But that hope may fail sometimes the users data may be altered or deleted. At times,
the cloud service providers may be dishonest and they may discard the data which
has not been accessed or rarely accessed to save the storage space or keep fewer
replicas than promised.

1. Data Loss or Manipulation


2. Untrusted Remote Server Performing Computation
Data Integrity
Authentication Techniques
and their Challenges
The field of Data Integrity is undergoing constant change. Researchers are
working on devising algorithms that are secure and efficient. Presently, there are
two basic schemes i.e. Provable Data Possession (PDP) and Proof of
Retrievability (PoR) while others are modified versions.
1. Provable Data Possession
This is a technique for assuring data integrity over remote servers. This lets the client
verify if the server has possession of data without actually retrieving it. It involves two
steps :

The first step is to pre-process and store the data. This involves creating a pair of
matching public and secret keys through a probabilistic key generation algorithm. The
file with the public key is sent to the server for the storage.

The second step is to verify file possession. The client challenges the server for a
proof of possession of a block in the file. The server computes the challenge and
sends back a response which helps the client verify possession.
2. Basic PDP scheme based
on MAC
This scheme proposed Message Authentication Code [MAC] based PDP to
ensure data integrity of file F stored on cloud storage in very simple way .The
data owner computes a MAC of the whole file with a set of secret keys and
stores them locally before outsourcing it to CSP.

It keeps only the computed MAC on his local storage, sends the file to the
Cloud Service Provider [CSP]. Whenever a verifier needs to check the Data
integrity of file F, He/she sends a request to retrieve the file from CSP, reveals a
secret key to the cloud server and asks to recompute the MAC of the whole file,
and compares the re-computed MAC with the previously stored value.
3. Scalable PDP
This method proposed Scalable PDP which is an improved version of the
original PDP. The main difference is Scalable PDP uses the symmetric
encryption whereas original PDP uses public key. This reduces computation
overhead. Scalable PDP can have dynamic operation on remote data.
Scalable PDP has all the challenges and answers pre-computed and has
limited number of updates.
Scalable PDP does not require bulk encryption.
It relies on the symmetric-key which is more efficient than public-Key
encryption. But it does not offer public verifiability.
4. Dynamic PDP
This method proposed Dynamic PDP which is a collection of seven
polynomial-time algorithms. It supports full dynamic operations like
insert, modify and delete.
This technique uses rank-based authenticated directories along with a
skip list for inserting and deleting functions.
Even though DPDP has some computational complexity, it is still efficient.
For example, for verifying the proof for 500MB file, DPDP only produces
208KB proof data and 15ms computational overhead.
5. Proof of
Retrievability (PoR)
Proof of Retrievability (POR) is a cryptographic method for remotely verifying
the integrity of files stored in the cloud, without keeping a copy of the users
original files in local storage. In a scheme, user backups his data file together
with some authentication data to a potentially dishonest cloud storage server.
User can check the data for its integrity stored with CSP using the authentication
key, without retrieving back the data file from cloud.

A PoR works on two phases :


Setup Phase
Sequence of Verfication Phases
6. PoR Based on Keyed
Hash Function hk(F)
A keyed hash function is very simple and easily implementable. It provides the
strong proof of integrity.
In this method the user, pre-computes the cryptographic hash of F using hk(F)
before outsourcing the data file F in the cloud storage, and stores secret key
K along with computed hash.
The user releases the secret key K to the CSP to check the integrity of the file
F and asks it to compute and return the value of hk(F).
7. Proof of Retrievability
for Large Files
In this method, only a single key can be used irrespective of the size of the file
or the number of files the user needs to access. In this method special sentinels
blocks, which are hidden among other blocks in the data file F are randomly
embedded among the data blocks.

To check the integrity of the data file F, the user challenges the cloud service
provider [CSP] during the verification phase by specifying the positions of a
collection of sentinels and asks the CSP to return the associated sentinel values.
If the CSP has modified or deleted some portion of F, then it is possible that the
position of sentinels also changed.
8. HAIL
Authors proposed HAIL, high-availability and integrity layer [HAIL] for cloud
storage, in which HAIL allows the user to store their data on multiple servers so
there is a redundancy of the data. Simple principal of this method is to ensure
data integrity of file via data redundancy.

HAIL uses Message Authentication Codes (MACs) and universal hash function
to ensure integrity process. The proof generated is independent of size of data
and it is compact in size.
9. PoR Based on Selecting
Random Bits in Data Blocks
This method proposed a technique which involves the encryption of the few
bits of data per data block instead of encrypting the whole file F thus reducing
the computational burden on the clients. It stands on the fact that high
probability of security can be achieved by encrypting fewer bits instead of
encrypting the whole data.
The client storage computational overhead is minimized as it does not store
any data with it and it reduces bandwidth requirements. Hence this scheme
suits well for thin client.

In these techniques user needs to store only a single cryptographic key and
two random sequence functions.

You might also like