Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2019 International Conference on Computer Communication and Informatics (ICCCI -2019), Jan.

23 – 25, 2019, Coimbatore, INDIA

Data Leakage Detection in cloud using Watermarking


Technique
Riya Naik Manisha Naik Gaonkar
Department of Computer Science & Engineering Professor, Department of Computer Science & Engineering
Goa College of Engineering Goa College of Engineering
Farmagudi, India Farmagudi, India
naikriya834@gmail.com manisha@gec.ac.in

Abstract-- Security is a consequential issue in data management leakage on cloud having massive amount of data. The existing
because information stored is extremely valuable and a system uses method called as perturbation where
paramount commodity. It is a universal belief that hackers approximation to the sensitive data is send to the client. In
generates security breaches, but in actuality most of data loss is method introduced, an exclusive data is inserted in each of the
because of insiders. In virtually distributed setup, handover of
transmitted copy, if that copy is afterwards found to be with
crucial data from the distributor to the trusted parties happens
frequently. It is a prime requirement to maintain the safety and unauthorized person then leaker can be recognized. Data is
stability of the services based on the expanding requests of users. also checked for any kind of manipulation or tampering.
When the sensitive data is leaked by the client, specific client Proposed project mainly emphasizes on detecting leaker and
accountable for the leak should be exposed at an initial stage. tampering of data using watermarking techniques for ensuring
Thus, the monitoring of data from the distributor to agents is data ownership protection as well as data integrity. Project
obligatory. Project proffers a data leakage detection system intends to build a watermarking algorithm based on frequency
applying watermarking algorithm, which examines the domain techniques which improves efficiency and resilience
tampering of data and evaluates that the leak of information of the watermarking scheme. The leakage can occur when an
came from one or more agents. Then, finally the procedure is
authority shares an organizational data with authorized clients.
implemented on a cloud server.
Client may share this data with un-trusted third parties which
Keywords— Watermark, Data leakage, Tampering, can trigger loss of revenue for an organization and defile its
Steganography, Cloud, AES, QR code, DCT, DWT, SVD. reputation. The scope of the project extends till as is limited
by a group of users who are interested in sharing their data to
I. INTRODUCTION an online community of users for business or research
Cloud computing is the rapidly emerging and striking purposes. Paper is devised in six sections. Section II wraps the
technology in the area of Information Technology with almost related works pertaining detection of data leakage, section III,
every IT company trying to get into it. In cloud computing, IV defines the problem definition and proposed data leakage
shared information resources and software’s are offered to detection technique, section V provides results and
devices on demand. One of the rudimentary facilities provided measurements and section VI gives the conclusion drawn.
by cloud is data storage. By exploiting the cloud, the II. LITERATURE SURVEY
employees are entirely liberated from the exasperating local
data storing and monitoring. However, it also causes a notable The research in the field of data security has led to large
threat to the privacy of the files. Cloud servers maintained by number of methods looking after leakage detection. There are
the cloud providers are not entirely confided by users but the various techniques available in literature for determining
data records stored in the cloud may be fragile and tampering and data leakers in cloud.
confidential, such as employee or the product details, business
policies, etc. As a result, data security in cloud computing has Data allotment is a prime focus of approach presented by
entailed lots of attention. Less control over data can cause Panagiotis Papadimitrou [1], it describes method using which
some severe security issues and menaces which may cause distributor can prudently hand out documents to clients in
data leakage. The amount of defiling done by a data leakage is order to raise the likelihood of exposing a guilty client. Fake
determined by the quality of sensitive data leaked. If the data objects are created by the distributor which are not present in
which has been leaked is of very much importance to the genuine data set. The objects are structured to imitate actual
organization, it may leave the organization in feeble state. The objects, and are delivered to clients together with genuine data
leakage may degrade the business and could lead to the objects to increase the chances of identifying clients
downfall of a company. In order to avert this problem different responsible for data leak. However, fake objects may affect
techniques of data leakage detection has been made such as the precision of operation performed by clients, so they may
fragmentation method, perturbation method, etc. each has not be constantly allowable. This approach proves that it is
been transpired to deal with detecting data leakage on feasible to examine the possibility that, a client is accountable
relational data. Proposed project takes care of detecting data for the leak based on the overlap of data with the leaked data.

978-1-5386-8260-9/19/$31.00 ©2019 IEEE


2019 International Conference on Computer Communication and Informatics (ICCCI -2019), Jan. 23 – 25, 2019, Coimbatore, INDIA

Method presented by Abhijit Singh et al. [2] illustrates that uses positioning and correction of QR codes and thus is
sensitive data ought to be watermarked prior to its distribution efficient against synchronization attacks. It also withstands
so that it could be allowed to trace its source with absolute geometric attacks and reduces calculation complexity. Method
conviction. When the information is watermarked it fortifies proves that, it is challenging to forge QR codes when digital
the data from being accessible to all and it becomes less watermarking is used. The data having QR code has higher
troublesome to spot out the corrupt client by using the fake PSNR values compare to those without any QR code as a
objects i.e. the watermark placed at different positions in the watermark.
data. Approach explores watermarking technique such as least
significant bit (LSB) [3] where watermark is encoded using
RSA [4] algorithm and is embedded in an audio file using III. PROBLEM DEFINATION.
LSB algorithm. In this, initially watermark is encrypted and The problem statement is to implement a system which detects
then embedded in the multimedia data, due to which deletion if there is tampering of data and thus finding leaker(s) of the
of watermark becomes very strenuous, which gives very data using watermarking with quick response code, to improve
significant robustness. security of data transferred.

S.Geetha et al. [5] depicts Division and Replication of Data in


The objective of the model is composed of three main parts,
the Cloud for Optimal Performance and Security (DROPS)
first is to extract information of data being transferred and
which segments owner’s files into various chunks and generating QR code using data properties. Second is to embed
replicate them into the cloud expanse. In DROPS QR watermark into cloud data using frequency domain
methodology, the entire file is not stored in cloud space but is techniques. Third part is to estimate if anyone has tampered
chopped into numerous fragments. These fragments have to be the data by checking current data characteristics and detecting
shared in the cloud space among clients. When data is guilty agent who has leaked the data by extracting watermark
fragmented, fragments are distributed all over the nodes, thus and comparing information from watermark with agent’s
making it feasible to access the possibility of an unauthorized details.
party accountable for a leak, based on the similarity of its
fragment with one that is leaked. IV. PROPOSED APPROACH
The project proffer the idea of tamper detection and data
In technique suggested by Neeraj Kumar et al. [6] server adds leaker detection in cloud. Project is divided in three different
image logo to the stored documents. Theory behind the sections as follows.
implementation of this method is to implant confidential
A. Data transfer
message into the document in a computationally effective
way. ASCII code is added to documents and AES is applied
with SHA-512 authentication hash algorithm. Major focus
here is that only approved clients will be able to access crucial
documents. Created watermark is send to the client with
public key certificates. Proposed method relies on symmetric
key encryption and that is why it’s unwise to it use for web
based scenario where numerous parties can acquire the data.

Rupesh Mishra [7] proposed Data allocation strategies which


can enhance the probability of detecting guilty client. Random
number of data objects are distributed among the clients.
Presented analysis confirms that fake objects has major impact
on identifying corrupt clients, which can be measured based
on similarity between the leaked data and client’s data.

AL.Jeeva [8] provided comparative analysis of encryption


algorithms. Comparisons were based on parameters such as
encryption ratio, speed, tunability, power consumption,
hardware software implementations and key length. Fig. 1. Steps involved in Data transfer.
Conclusively among the symmetric encryption techniques, the
Advanced Encryption standard AES is reviewed as more Proposed method includes two entities the distributor and
preferable solution because of less energy consumption and agent. Distributor will be Company personnel is responsible to
buffer usage, and decrease in encryption and decryption time. distribute data to third parties.
Weijun Zhang [9] presented mechanism in which data to be Distributor has functionalities of adding agents, storing data,
embedded as a watermark is translated as QR code previous to distributing data, tamper detection and finding attacker agents
embedding. The new algorithm proposed is more secure as it Agent or client will be third parties who are involved in
2019 International Conference on Computer Communication and Informatics (ICCCI -2019), Jan. 23 – 25, 2019, Coimbatore, INDIA

business and receive data from distributor which could be if it reads the data accurately or not. The code words obtained
mailing list, employee salaries, multimedia data, etc. in previous step are later arranged according to the QR code
requirement. After arrangements, bits are placed in QR code
The primary step of data transfer phase is information retrieval matrix and code is modified to a specific format based on
and encryption. Information about the data is obtained which masking patterns. Finally the version and format data is
identifies the data uniquely. Once the information is retrieved, appended to the QR code. The format identifies error
message (MS1) is created using retrieved information and correction and masking pattern. The version area encapsulates
recipient’s client ID. Additionally the message is encrypted data referring to the magnitude of the QR matrix [11].
utilizing symmetric key encryption i.e. Advanced Encryption
standard (AES) algorithm to avert any tampering to the Generated QR code is used as an input for watermark [12]
information. AES Algorithm [10] is considered to be more embedding algorithm along with the source data. After
effective for steganography purpose as it works more rapidly watermark embedding the data as a whole is remodeled and
than triple DES. AES has inbuilt flexibility in key size. In yields final authentic watermarked data suitable for
AES, key with 128 bit key length uses 10 rounds, 192 bit key transmission.
length utilizes 12 rounds and 256 bit key length uses 14
B. Watermark Embedding and Extraction.
rounds. Each round uses different key generated from original
AES key.

Fig. 3. Watermark Embedding and Extraction Process.

An Input image data is taken, which can be of any format such


as jpeg, bmp, tiff, etc. For the image under consideration,
Fig. 2 Advanced Encryption Standard Algorithm. image properties are found such as image width and image
height in pixels, format of an image, type of an image for
AES is proved effectual in terms of hardware and software example color, grey scale, bits per pixels, etc. Data
and immune to most of the cryptanalytic attacks. identification information is derived and converted to form a
message which also includes recipient client’s data. Message
After symmetric key encryption of a message, QR code [11] is is encrypted using AES symmetric key encryption. Encrypted
generated using encrypted message MS1 which provides message is fed as an input to QR code generator. Generated
unique QR code for each client data. QR codes are 2- QR code is embedded in an image as a watermark.
Diamentional machine intelligible with significant data
inserted in black squares organized in a grid on a white Proposed watermark embedding and extraction algorithm
background. chains qualities of three distinct techniques namely DCT,
DWT and SVD. Initially one level DWT [13] is employed to
Generation of QR code takes place in various steps. The initial the host image to be transferred. To attain imperceptibility LL
step in the procedure called data analysis is to verify that the band is nominated for second level decomposition and
input data can be translated in byte, kanji, numeric or LL_HH band is picked which is further partitioned into 4 x 4
alphanumeric. Each mode encodes text as string of bits using sub blocks. DCT [14] is employed to every sub blocks and DC
different method. Next step is to convert the data, which values of every block is chosen and created it in a matrix.
outputs data code words made up of string of bits. QR code SVD is performed to the derived matrix and calculated
incorporates error correction which QR scanners use to check singular values are adjusted with singular values of the
2019 International Conference on Computer Communication and Informatics (ICCCI -2019), Jan. 23 – 25, 2019, Coimbatore, INDIA

generated watermark. Inverse SVD, inverse DCT and inverse 4. Next perform DCT on every block of LL_HH* and choose
DWT is implemented to acquire authentic watermarked DCT values to obtain matrix A.
image.
5. Employ SVD to matrix A, A= WU*WS*WVT to acquire
Presented algorithms proves to be better as it is resistant WU, WS, WV.
against several attacks, gives best results against compression
and also results in better PSNR [15] values of an image. 6. Calculate SW=(S-WS) /.
Integration of these three transforms increases the
performance of watermarking significantly when matched to 7. Finally, retrieve the watermark using EW= WU*SW*WVT.
the DCT – SVD [16], DWT watermarking techniques.
C. Tamper detection and detecting data leaker
a) The embedding algorithm is briefly illustrated as
given below [17].
1. Let OI represent an original cover image of size N x N.
Select a color channel and perform DWT to segment it into
four separate N/2 x N/2 sub-bands LL, HH, LH and HL.

2. Choose LL band and employ DWT to divide it even further


into LL_LL, LL_HH, LL_LH and LL_HL bands.

3. Select LL_HH band, split it into 4 x 4 square blocks and


perform DCT on it, choose DCT value of every block and
derive a matrix B.

4. Employ SVD to matrix B, B=U1*S1*V1T, acquire U1, S1


and V1.

5. Let OW represent a watermark of size N/16 x N/16. Apply


SVD to OW= WU*WS*WVT, acquire WU, WS and WV.

6. Adjust S1 with watermark value WS such that S=S1 + *


WS. Fig. 4. Tamper detection and Data leaker detection.

7. Find matrix B* with B*= U*S*VT. In this final phase watermarked data i.e. leaked data is
examined to extract the watermark. Extracted QR code is later
scanned to fetch the encrypted information. Interpreting data
8. Employ inverse DCT to the obtained B* to retrieve from the QR code is opposite of the converting process. First
LL_HH*. step in this process is to recognize the dark and light units as
an array of 0s and 1s. Next step is to extract the format data
9. Perform inverse DWT onto LL_LL, LL_HH*, LL_LH and which reveals the masking pattern and to determine version of
LL_HL to obtain matrix LL*. the QR. Then, XOR the coded area with mask pattern. Next,
replace the text and correct the errors detected. Divide the data
10. Employ inverse DWT to LL*, HH, LH and HL, set it to codewords into segments and decode the text in conformance
nominated color channel to return watermarked image WI. to the mode [11].

b) The extraction algorithm is explained as given Later the enciphered data in QR code is decoded using AES
below [17]. decryption algorithm. Client data extracted from QR code is
1. First, choose a color channel and employ DWT to compared with the data stored in cloud database. Client with
watermarked image WI in order to separate out LL*, HH, LH matching information is considered as guilty who has leaked
and HL bands. critical organizational data. Tamper detection module is used
to detect if there is any distortion or tampering of data. To
2. Perform DWT on LL* band and segment it further into check if current image data is tampered, the image properties
LL_LL, LL_HH*, LL_LH and LL_HL sub bands. are found and is then compared with original data properties.
If there appears a mismatch between the two, the suspected
3. Pick out LL_HH* band and split it into 4 x 4 blocks. copy is declared as tampered or manipulated.
2019 International Conference on Computer Communication and Informatics (ICCCI -2019), Jan. 23 – 25, 2019, Coimbatore, INDIA

V. EXPERIMENTAL RESULTS For tamper and data leaker detection, the suspected image is
In this model we take a dataset of scenic images. For subjected to DWT-DCT-SVD watermark extraction algorithm
transmission, an image identification information along with to retrieve the embedded watermark which is a QR code
timestamp is concatenated with recipient client ID to form a image shown in figure below.
message. Message is then encrypted using AES algorithm.
Encrypted message is further converted into a QR code image.
Generated QR code image is embedded onto a cover image
using DWT-DCT-SVD watermark embedding algorithm to
make it more imperceptible. Finally, the watermarked image is
distributed to the respective client. The output of this module
is illustrated in figures below.

Fig. 8. Extracted Watermark.

Extracted QR code is translated to get an encrypted message.


Further the message is deciphered using AES decryption
algorithm and client with a recovered Client ID is considered
as guilty or corrupt. Later, the suspected leaked copy is tested
for tampering by comparing its calculated properties such as
resolution, orientation, date and time, software, image type,
Fig. 5. Cover Image
etc. with an original copy. If mismatch occurs, the suspected
copy is been considered as tampered.

Comparison between different watermarking algorithms is


carried out. Algorithms are assessed based on PSNR of an
image after being watermarked. Evaluation parameters
considered are Peak Signal Noise Ratio (PSNR) and Mean
Squared Error (MSE) [17].

Mean squared error (MSE) is used to estimate, mean of


squares of errors. That is, the average squared variance
between the expected values and actual values.


MSE = σ௠ିଵ ௡ିଵ
௜ୀ଴ σ௜ୀ଴ ሾ‫ܫ‬ሺ݅Ǥ ݆ሻ െ ‫ܭ‬ሺ݅ǡ ݆ሻሿ
2
(1)
Fig.6. Generated QR code for client ID
௠‫כ‬௡

Where I is the actual image and K is the watermarked image.

Peak signal to noise ratio (PSNR), is a proportion between the


maximum potential of a signal and the potential of
corrupting noise that disturbs the conformity of its depiction.

௠௔௫
PSNR = ʹͲ ‫Ͳͳ ‰‘Ž כ‬ሺ ሻ (2)
ξெௌா

Where max equivalent to 255 for grayscale image and MSE is


the mean square error. For evaluation, watermarking
algorithms were verified on the set of images of size 512 x
512 pixels, and the watermark image of size 64 x 64 pixels.
The watermarking techniques were tested on distinct images
Fig.7. Watermarked Image and a particular image is treated as mutual watermark.
2019 International Conference on Computer Communication and Informatics (ICCCI -2019), Jan. 23 – 25, 2019, Coimbatore, INDIA

Data leakage detection system is extremely advantageous as it


Results are depicted in Graph as follows can proffer security to the data throughout its transmission. It
can also detect if that data gets leaked. Existing system can
deliver security using several algorithms through encryption,
PSNR COMPARISON whereas model proposed offers security and detection
technique. Hybrid watermarking algorithm used incorporates
LSB DCT DWT imperceptibility and robustness to the model.
DCT-SVD DWT-SVD DWT-DCT-SVD
REFERENCES
40.21 40.68
[1] PanagiotisPapadimitriou, HectorGarcia-Molin“Data Leakage Detection”
38.6 38.42 IEEE Transactions on Knowledge and Data Engineering, 2011, Volume
23, Issue 1.
35.03 [2] Abhijeet Singh, Abhineet Anand, “Data Leakage Detection Using Cloud
34.66 Computing” International Journal of Engineering and Computer
Science, Volume 6, Issue 4, April 2017.
[3] Abdullah Bamatraf, Rosziati Ibrahim and Mohd, Najib Mohd Salleh, “A
New Digital Watermarking Algorithm Using Combination of Least
Significant Bit (LSB) and Inverse Bit”, International Journal of
PSNR computing, volume 3, Issue 4, April 2011.
[4] Xin Zhou, Xiaofei Tang, “Reasearch and Implementation of RSA
Fig. 9. PSNR Comparison. Algorithm for Encryption and Decryption”, Interational Forum on
Proposed watermarking algorithm was tested on set of images Strategic Technology, 2011 IEEE.
embedded with different string lengths for both embedding [5] S.Geetha, M.Nishanthini, G.Shanthi, K.Sivabharathi, M.Suganya “Data
and extraction. Certain amount of data loss was observed in Leakage Detection and Security Using Cloud Computing”, International
Journal of Engineering Research and Applications, Volume 6, Issue 3,
the obtained QR code for a set of specific images as the string March 2016.
length increases. Proposed watermarking algorithm is [6] Neeraj Kumar, Vijay Katta, Himanshu Mishra, Hitendra Garg, “
evaluated on set of distinct types of images such as Detection of Data Leakage in Cloud Computing Environment”,
underexposed, overexposed, high contrast and high definition International Conference on Computational Intelligence and
Communication Networks, 2014 IEEE.
images to identify pixel pairs which extracts unblemished QR
[7] Rupesh Mishra, D.K Chitre, “Data Leakage and Detection of Guilty
code image. Results depicted in figure below, shows that pixel Agent”, International Journal of Scientific & Engineering Research,
pairs (4, 6) and (6, 4) gives better results. Volume 3, Issue 6, 2012.
[8] AL.Jeeva, Dr.V.Palanisamy, K.Kanagaram, “ Comparative Analysis of
Performance and Security Measuresof Some Encryption Algorithm”,
International Journal of Engineering Research and Applications, Volume
2, Issue 3.
[9] Weijun Zhang, Xuetian Meng, “An Improved Digital Watermarking
Technology Based on QR code”, International Conference on Computer
Science and Network Technology, 2015 IEEE.
[10] Ahmed Fathy, Ibrahim F. Tarrad,Hesham F.A. Hamed, Ali Ismail Awa1,
“Advanced Encryption Standard Algorithm: Issues and Implementation
Aspects” Springer 2012.
[11] Sumit Tiwari, “ An introduction to QR Code Technology” internatinal
Fig. 10. QR for pixel pairs (4, 6) and (6, 5). conference on information technology, 2016 IEEE.
[12] Yanqun Zhang, “Digital Watermarking technology: A Review”
VI.CONCLUSION Internation Conference on Future Computer and Communication, 2009
IEEE.
Data leakage detection model is very beneficial in various [13] Gu Tianming ,Wang Yanjie, “DWT-based Digital Image Watermarking
industries, organization, research communities or an Algorithm”, The Tenth International Conference on Electronic
institution who tend to share their data with any third party Measurement & Instruments, 2011 IEEE.
online. The design proposed, allows detection of data leak and [14] Syed Ali Khayam, “The Discrete Cosine Transform (DCT): Theory and
tampering on cloud data using watermarking algorithm. Application”, March 2003.
Presented approach takes an image data to be transferred and [15] Sha Wang, Dong Zheng, Jiying Zhao, “ An Image Quality Evaluation
Method Based on Digital Watermarking” IEEE transactions on circuits
implants a Quick Response code i.e. the watermark generated and systems for video technology, Volume 17,2007.
using data and recipient client’s details using hybrid [16] Mengmeng Li, Chao Han,“A DCT-SVD Domain Watermarking for
watermarking algorithm. Data leaker is identified by Color Digital Image Based on Compressed Sensing Theory and Chaos
extracting watermark from an image and assessing it with Theory”, International Symposium on Computational Intelligence and
client’s information. Data is further tested to determine Design, 2014 IEEE.
tampering by comparing its properties with that of original [17] Riya Naik, Manisha Naik Gaonkar, “ Hybrid Digital Watermarking
Technique based on DWT-DCT-SVD Algorithms” International Journal
data. of Engineering Research in Computer Science and Engineering Volume
5, Issue 4, 2018

You might also like