Professional Documents
Culture Documents
Data Leakage Detection in Cloud Using Watermarking Technique
Data Leakage Detection in Cloud Using Watermarking Technique
Abstract-- Security is a consequential issue in data management leakage on cloud having massive amount of data. The existing
because information stored is extremely valuable and a system uses method called as perturbation where
paramount commodity. It is a universal belief that hackers approximation to the sensitive data is send to the client. In
generates security breaches, but in actuality most of data loss is method introduced, an exclusive data is inserted in each of the
because of insiders. In virtually distributed setup, handover of
transmitted copy, if that copy is afterwards found to be with
crucial data from the distributor to the trusted parties happens
frequently. It is a prime requirement to maintain the safety and unauthorized person then leaker can be recognized. Data is
stability of the services based on the expanding requests of users. also checked for any kind of manipulation or tampering.
When the sensitive data is leaked by the client, specific client Proposed project mainly emphasizes on detecting leaker and
accountable for the leak should be exposed at an initial stage. tampering of data using watermarking techniques for ensuring
Thus, the monitoring of data from the distributor to agents is data ownership protection as well as data integrity. Project
obligatory. Project proffers a data leakage detection system intends to build a watermarking algorithm based on frequency
applying watermarking algorithm, which examines the domain techniques which improves efficiency and resilience
tampering of data and evaluates that the leak of information of the watermarking scheme. The leakage can occur when an
came from one or more agents. Then, finally the procedure is
authority shares an organizational data with authorized clients.
implemented on a cloud server.
Client may share this data with un-trusted third parties which
Keywords— Watermark, Data leakage, Tampering, can trigger loss of revenue for an organization and defile its
Steganography, Cloud, AES, QR code, DCT, DWT, SVD. reputation. The scope of the project extends till as is limited
by a group of users who are interested in sharing their data to
I. INTRODUCTION an online community of users for business or research
Cloud computing is the rapidly emerging and striking purposes. Paper is devised in six sections. Section II wraps the
technology in the area of Information Technology with almost related works pertaining detection of data leakage, section III,
every IT company trying to get into it. In cloud computing, IV defines the problem definition and proposed data leakage
shared information resources and software’s are offered to detection technique, section V provides results and
devices on demand. One of the rudimentary facilities provided measurements and section VI gives the conclusion drawn.
by cloud is data storage. By exploiting the cloud, the II. LITERATURE SURVEY
employees are entirely liberated from the exasperating local
data storing and monitoring. However, it also causes a notable The research in the field of data security has led to large
threat to the privacy of the files. Cloud servers maintained by number of methods looking after leakage detection. There are
the cloud providers are not entirely confided by users but the various techniques available in literature for determining
data records stored in the cloud may be fragile and tampering and data leakers in cloud.
confidential, such as employee or the product details, business
policies, etc. As a result, data security in cloud computing has Data allotment is a prime focus of approach presented by
entailed lots of attention. Less control over data can cause Panagiotis Papadimitrou [1], it describes method using which
some severe security issues and menaces which may cause distributor can prudently hand out documents to clients in
data leakage. The amount of defiling done by a data leakage is order to raise the likelihood of exposing a guilty client. Fake
determined by the quality of sensitive data leaked. If the data objects are created by the distributor which are not present in
which has been leaked is of very much importance to the genuine data set. The objects are structured to imitate actual
organization, it may leave the organization in feeble state. The objects, and are delivered to clients together with genuine data
leakage may degrade the business and could lead to the objects to increase the chances of identifying clients
downfall of a company. In order to avert this problem different responsible for data leak. However, fake objects may affect
techniques of data leakage detection has been made such as the precision of operation performed by clients, so they may
fragmentation method, perturbation method, etc. each has not be constantly allowable. This approach proves that it is
been transpired to deal with detecting data leakage on feasible to examine the possibility that, a client is accountable
relational data. Proposed project takes care of detecting data for the leak based on the overlap of data with the leaked data.
Method presented by Abhijit Singh et al. [2] illustrates that uses positioning and correction of QR codes and thus is
sensitive data ought to be watermarked prior to its distribution efficient against synchronization attacks. It also withstands
so that it could be allowed to trace its source with absolute geometric attacks and reduces calculation complexity. Method
conviction. When the information is watermarked it fortifies proves that, it is challenging to forge QR codes when digital
the data from being accessible to all and it becomes less watermarking is used. The data having QR code has higher
troublesome to spot out the corrupt client by using the fake PSNR values compare to those without any QR code as a
objects i.e. the watermark placed at different positions in the watermark.
data. Approach explores watermarking technique such as least
significant bit (LSB) [3] where watermark is encoded using
RSA [4] algorithm and is embedded in an audio file using III. PROBLEM DEFINATION.
LSB algorithm. In this, initially watermark is encrypted and The problem statement is to implement a system which detects
then embedded in the multimedia data, due to which deletion if there is tampering of data and thus finding leaker(s) of the
of watermark becomes very strenuous, which gives very data using watermarking with quick response code, to improve
significant robustness. security of data transferred.
business and receive data from distributor which could be if it reads the data accurately or not. The code words obtained
mailing list, employee salaries, multimedia data, etc. in previous step are later arranged according to the QR code
requirement. After arrangements, bits are placed in QR code
The primary step of data transfer phase is information retrieval matrix and code is modified to a specific format based on
and encryption. Information about the data is obtained which masking patterns. Finally the version and format data is
identifies the data uniquely. Once the information is retrieved, appended to the QR code. The format identifies error
message (MS1) is created using retrieved information and correction and masking pattern. The version area encapsulates
recipient’s client ID. Additionally the message is encrypted data referring to the magnitude of the QR matrix [11].
utilizing symmetric key encryption i.e. Advanced Encryption
standard (AES) algorithm to avert any tampering to the Generated QR code is used as an input for watermark [12]
information. AES Algorithm [10] is considered to be more embedding algorithm along with the source data. After
effective for steganography purpose as it works more rapidly watermark embedding the data as a whole is remodeled and
than triple DES. AES has inbuilt flexibility in key size. In yields final authentic watermarked data suitable for
AES, key with 128 bit key length uses 10 rounds, 192 bit key transmission.
length utilizes 12 rounds and 256 bit key length uses 14
B. Watermark Embedding and Extraction.
rounds. Each round uses different key generated from original
AES key.
generated watermark. Inverse SVD, inverse DCT and inverse 4. Next perform DCT on every block of LL_HH* and choose
DWT is implemented to acquire authentic watermarked DCT values to obtain matrix A.
image.
5. Employ SVD to matrix A, A= WU*WS*WVT to acquire
Presented algorithms proves to be better as it is resistant WU, WS, WV.
against several attacks, gives best results against compression
and also results in better PSNR [15] values of an image. 6. Calculate SW=(S-WS) /.
Integration of these three transforms increases the
performance of watermarking significantly when matched to 7. Finally, retrieve the watermark using EW= WU*SW*WVT.
the DCT – SVD [16], DWT watermarking techniques.
C. Tamper detection and detecting data leaker
a) The embedding algorithm is briefly illustrated as
given below [17].
1. Let OI represent an original cover image of size N x N.
Select a color channel and perform DWT to segment it into
four separate N/2 x N/2 sub-bands LL, HH, LH and HL.
7. Find matrix B* with B*= U*S*VT. In this final phase watermarked data i.e. leaked data is
examined to extract the watermark. Extracted QR code is later
scanned to fetch the encrypted information. Interpreting data
8. Employ inverse DCT to the obtained B* to retrieve from the QR code is opposite of the converting process. First
LL_HH*. step in this process is to recognize the dark and light units as
an array of 0s and 1s. Next step is to extract the format data
9. Perform inverse DWT onto LL_LL, LL_HH*, LL_LH and which reveals the masking pattern and to determine version of
LL_HL to obtain matrix LL*. the QR. Then, XOR the coded area with mask pattern. Next,
replace the text and correct the errors detected. Divide the data
10. Employ inverse DWT to LL*, HH, LH and HL, set it to codewords into segments and decode the text in conformance
nominated color channel to return watermarked image WI. to the mode [11].
b) The extraction algorithm is explained as given Later the enciphered data in QR code is decoded using AES
below [17]. decryption algorithm. Client data extracted from QR code is
1. First, choose a color channel and employ DWT to compared with the data stored in cloud database. Client with
watermarked image WI in order to separate out LL*, HH, LH matching information is considered as guilty who has leaked
and HL bands. critical organizational data. Tamper detection module is used
to detect if there is any distortion or tampering of data. To
2. Perform DWT on LL* band and segment it further into check if current image data is tampered, the image properties
LL_LL, LL_HH*, LL_LH and LL_HL sub bands. are found and is then compared with original data properties.
If there appears a mismatch between the two, the suspected
3. Pick out LL_HH* band and split it into 4 x 4 blocks. copy is declared as tampered or manipulated.
2019 International Conference on Computer Communication and Informatics (ICCCI -2019), Jan. 23 – 25, 2019, Coimbatore, INDIA
V. EXPERIMENTAL RESULTS For tamper and data leaker detection, the suspected image is
In this model we take a dataset of scenic images. For subjected to DWT-DCT-SVD watermark extraction algorithm
transmission, an image identification information along with to retrieve the embedded watermark which is a QR code
timestamp is concatenated with recipient client ID to form a image shown in figure below.
message. Message is then encrypted using AES algorithm.
Encrypted message is further converted into a QR code image.
Generated QR code image is embedded onto a cover image
using DWT-DCT-SVD watermark embedding algorithm to
make it more imperceptible. Finally, the watermarked image is
distributed to the respective client. The output of this module
is illustrated in figures below.
ଵ
MSE = σିଵ ିଵ
ୀ σୀ ሾܫሺ݅Ǥ ݆ሻ െ ܭሺ݅ǡ ݆ሻሿ
2
(1)
Fig.6. Generated QR code for client ID
כ
௫
PSNR = ʹͲ Ͳͳ כሺ ሻ (2)
ξெௌா