Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

International Conference on Smart Systems and Inventive Technology (ICSSIT 2018)

IEEE Xplore Part Number: CFP18P17-ART; ISBN:978-1-5386-5873-4

Block Level based Data Deduplication and Assured


Deletion in Cloud

Sneha C. Sathe Nilima M.Dongre


M.E student: Department of Information Technology Assistant Professor: Department of Information Technology
Ramrao Adik Institute of Technology Ramrao Adik Institute of Technology
Nerul, Navi Mumbai Nerul, Navi Mumbai
snehasathe47@gmail.com nilimarj@gmail.com

Abstract—With the progression of web, an ever increasing redundant copies of data from being uploaded. Using
number of individuals have begun to outsource information on deduplication, it becomes easier to store a single copy of the
cloud. However, sending information to an outsider creates file and eliminate the multiple copies from being transferred
opportunity for attackers to easily retrieve data. Because of in the distributed storage framework.
simplicity, individuals regularly tend to store excess information
on the cloud. This leads to wastage and increase in the cost of
storage space.As nobody knows where the information is put In Single Block level File Deduplication, the complete file is
away on cloud; it is likewise vital to certainly erase information considered as a single block. A single fingerprint is made for
from cloud. Hence along with security, data de-duplication and the complete file and compared with the already stored
assured deletion are the major concerns in cloud storage. In this fingerprints. This makes more storage room as less number of
paper, we put-forth an application based on block level based file files are being put away and calculation time is lessened.
deduplication scheme over fragmented blocks that addresses the Fixed sized block-level deduplication divides a file into equal
issue of duplication in the cloud storage platform. We also focus sized chunks and compares the fingerprints of each chunk with
on guaranteed erasure using user defined policies. Thus on already stored chunks. This technique helps to find the number
expiry of a policy, the keys are deleted therefore making the file
of chunks that are duplicate in a file. But because of its settled
inaccessible to anyone including the proprietor.
size nature, it can't de-tect deduplication on a more extensive
Keywords—Block level deduplication Fragmentation, Policy scale due to content shifting problem. Consequently, content
assured deletion level deduplication is used in which which real substance of
document is checked.
I. INTRODUCTION
Cloud computing is defined as the conveyance of B. Data Deletion in Cloud
administrations by means of web through a cloud service
platform. Organizations and business firms are currently Data is replicated across several locations; whenever the
moving to distributed storage keeping in mind the end goal to proprietor wants to delete the data, he should be assured of
store the organization reports, introductions, business-basic data deletion over all areas in the cloud. The cloud specialist
information and reinforcement records. Scalability, co-ops and the clients have service level agreements (SLA) for
availability, elasticity, pay-per-use are a portion of the the same. Nonetheless, as client isn't sure whether his
advantages of distributed computing. Google Cloud, Amazon information is erased or not, guaranteed erasure is a basic
Web Services, Microsoft Azure, IBM Bluemix are a few who worry in distributed storage frameworks [10]. Policy based file
provide cloud computing services[9]. Because of its dispersed assured deletion ensures assured way of deletion when the
and framework less nature, clients don't know where their files associated with access policies are revoked.
information is put away on the cloud. Hence, cloud storage Our contributions are as follows:
suffer from security, integrity, confidentiality issues. It is  Develop a cloud based application that stores file in
therefore required to secure the data before uploading to the fragmented and encrypted format on storage nodes.
cloud. As clients spare excess information on the cloud, the The main focus is to store only one fragment on a
lease of distributed storage increments. Consequently to limit particular storage node so that even if a node is
the cost factor, it is important to check the document before compromised, no information leakage is observed.
transferring for duplication. Attribute based file access is provided so that only
those users whose attributes match can have access to
the file.
A. Data deduplication in cloud
Data deduplication is a strategy by which the document is first
checked for a copy duplicate on the cloud. It helps to eliminate

978-1-5386-5873-4/18/$31.00 ©2018 IEEE 406


International Conference on Smart Systems and Inventive Technology (ICSSIT 2018)
IEEE Xplore Part Number: CFP18P17-ART; ISBN:978-1-5386-5873-4

 Provide a block level deduplication scheme that works C. File Assured Deletion
on fixed blocks to check duplicate blocks on cloud Data deletion is an important issue in cloud. Even if the
using SHA-512 algorithm. user deletes the data, he is not assured of complete and secure
 Provide a policy based assured deletion scheme to deletion of data. M. Vanitha and Dr. C.Kavitha[5] has
assuredly delete data from cloud based application. proposed a strategy to distinguish singular client's information
and their encryption enters keeping in mind the end goal to
II. LITERATURE REVIEW give an answer for erase information from cloud supplier's
multi-tenant storage architecture using a policy file which is
A. Fragmentation and Replication kept up by the key administrator. Subsequently, when the
policy gets revoked, the data files become unrecoverable.
Fragmentation deals with creating chunks of a file and then
storing it on cloud. Correspondingly replication makes in
Authors in [6] have presented a FadeVersion, secure cloud
excess of one duplicate of the document in the distributed
reinforcement framework. The framework does not store
storage for simple recovery at a later stage. Authors in [1]
excess information in different renditions of reinforcements. It
have proposed DROPS that arrangements with the security
gives security to information, rendition control of record and
issues in cloud. The data is divided and put in the storage
gives guaranteed erasure of documents. Diverse information
nodes. Each node is selected using centrality measure thus
objects are made for a document and encoded with an
stores only a single fragment. Consequently, regardless of
arrangement of information keys additionally scrambled with
whether a node is endangered; no imperative information is
set of control keys. A master key is used as it is easier to
uncovered.
maintain all the control keys. On the off chance that a strategy
B. Deduplication Techniques for document is disavowed, at that point the separate control
Deduplication on cloud can be performed at file- level, key is erased. If the data object is linked only with the revoked
block- level and content level. Authors in [2] has proposed a policy, at that point it will be guaranteed erased. However if
framework that spotlights on file and block level deduplication the data object is associated with both the revoked policy and
techniques. It gives an approach to discover copies by looking another active policy, at that point it can at present be gotten
at records utilizing MD5 Algorithm. It supports threshold through the active policy.
based deduplication check where a percentage value is set for
duplication check. If the duplication percentage is greater than Ashfia Binte Habib, Tasnim Khanam, Rajesh Palit[7] have
50%, then file is refrained from being transferred to the cloud. proposed a methodology which is an alternative and simpler
way for ensured erasure of documents by means of
Haonan Su, Dong Zheng, Yinghui Zhang in [3] have cryptographic algorithms and other tools without the help of a
proposed a scheme that realizes variable-size block-level key manager. Firstly, the user who wants to upload a file will
deduplication using Rabin fingerprinting. The client side first provide a secret phrase. A random key (key 2) will be
partitions the file F into a finite set of blocks Bi (where i = 1, 2 generated by which the document will be encoded. Secondly,
...). The user computes each block fingerprint f(Bi), and using the phrase another key (key 1) will be generated which
forwards the block fingerprints f(Bi) to the Cloud server for will further encrypt key 2. Finally, encrypted file and key 2
duplicate checks. f(Bi) acquired is sent to the Third Party who will be transferred to the cloud. Consequently erasure of key,
performs the randomization of the convergent keys. In the prompts guaranteed erasure of document.
event that copy square is discovered, at that point client is
informed for the same else document is transferred after III. PROPOSED SYSTEM
encryption.
We have proposed a cloud based application through
Ankush R. Deshmukh , Prof. R. V. Mante and Dr. P. N. which clients can store information on cloud. The application
Chatur [4] proposed a system that deals with document level performs duplication check before transferring the record on
deduplication and destruction in cloud using time variant the cloud. Block level based deduplication seems to be one of
encryption. Attribute based encryption(ABE) permits access the best and most useful technique used for deduplication
and decoding on a man's part/benefits inside an association, as check in cloud. Block level based technique allows better
opposed to by a man's particular personality. accuracy while finding redundant data in cloud. Unlike file
based technique; fixed size block level deduplication allows to
Authors in [8] have proposed a plagiarism system that uses check duplicates for redundant blocks of data. AES encryption
Rabin-Karp algorithm in search plagiarized content in is provided on the fragments before being uploaded to the
assignments submitted by students in college or university. It cloud. Attribute based encryption is used in order to allow
fastens the processing of string comparison by matching given only authorized users having a particular set of policies to
pattern with different i/p document’s string/substring by using access the file. Convergent key is used in order to share
hash esteems. It uses hash function for every string/substring ownership access of file. Also with respect to deletion, policy
in text. If hash estimation of given pattern and substring based file assured deletion is being used that works best on
matches, it implies two strings are comparable. user defined policies.

978-1-5386-5873-4/18/$31.00 ©2018 IEEE 407


International Conference on Smart Systems and Inventive Technology (ICSSIT 2018)
IEEE Xplore Part Number: CFP18P17-ART; ISBN:978-1-5386-5873-4

existing file fragments in the cloud. The count for the


total number of duplicate blocks is obtained. A
threshold value is predefined, and if count of duplicate
blocks exceeds the given threshold, the entire file is
considered to be a duplicate and not allowed to be
uploaded on the cloud. Or else, the file is directly
allowed to be uploaded on the cloud and the uploader
becomes the owner of the file.
 File Upload After duplication check, the fragments
are encrypted using Symmetric encryption algorithm
i.e AES and then uploaded in encrypted format on the
cloud. The file is accessed by the owner and other users
based upon their specfic attributes. Hence a key is
generated on the concept of Attribute Based
Encryption. The user who wishes to download the file
would require the key for decryption of the file and
then only he can download the file. Controlled
replication is being used so that only a single replicated
copy of file is stored into the cloud to improve the
security and recovery aspects of the file. If a file is
found to be duplicate, then the user cannot directly
upload the file on cloud. The user needs to acquire
permissions from the actual owner of the file and is
only then permitted to upload the file. The user sends
request to the owner of the file by sharing ownership
access of the file using convergent key concept. The
owner downloads the file and only if he wishes to
allow the user to upload the file, a random dekey is
generated for each user. The dekey helps the owner to
identify which user’s duplicate file is being allowed to
upload on the cloud. Upon receiving permissions, the
user is allowed to upload the complete file.
Fig. 1. System Flow  Download file with decryption If any user other than
the file owner wants to download the file from the
A. System Description cloud, he needs the key for downloading the file. Only
users whose attributes matches can download the file.
 Registration: The user has to first register for As the file is saved in the cloud in encrypted format,
uploading and downloading the file. During registration the file should be first brought into its decrypted format
user has to provide a username and password for his to get the file in its original format.
account which would be required during login. The  File Assured Deletion: If a policy is revoked, then the
user only needs to enter his original password during application completely destroys the keys that are
registration. During login, a session grid password related to the particular file after expiration of
mechanism will be implemented that avoids shoulder particular time. If the owner implicitly issues request
for file deletion, the owner is first verified based on his
surface attack.
attributes and if authorized, the keys are deleted. As the
database does not contain the keys, we ensure that file
 Fragmentation of file: When the user tries to upload F, which is tied to policy, is assuredly deleted. The
the file on cloud, the application fragments the file into policy revocation operations do not involve
fixed size blocks. The fragments are distributed across interactions with the cloud. Also assured deletion is
the storage nodes such that no single node contains performed if the user deletes the file.
more than one block of the file. As a result, even a
successful attack on a particular node won’t leak any
significant information.
 Duplication Check: Before uploading the file to the IV. ADVANTAGES OF PROPOSED SYSTEM
cloud, the file is checked for block level deduplication.
The hash value of each block is computed using SHA-  Distributed access control of data stored in cloud so
512 and is stored in a temporary database. Each hash that only authorized users with valid attributes can
value is compared with the hash values of the already access them.

978-1-5386-5873-4/18/$31.00 ©2018 IEEE 408


International Conference on Smart Systems and Inventive Technology (ICSSIT 2018)
IEEE Xplore Part Number: CFP18P17-ART; ISBN:978-1-5386-5873-4

 The identity of the user is protected from the cloud References


during authentication.
 Achieve protection from cloud client’s perspective, no [1] Mazhar Ali, Kashif Bilal, Samee U. Khan, Bharadwaj Veeravalli, Keqin
changes on the cloud provider side. Data is protected Li and Albert Y. Zomaya,”DROPS: Division and Replication of Data in
from attacker as fragmented blocks are stored on Cloud for Optimal Performance and Security”, 2015 IEEE
TRANSACTIONS ON CLOUD COMPUTING.
different storage nodes.
[2] Namrata P. Kawtikwar, Prof. M. R. Joshi, Nur’aini Abdul Rashid, ”Data
 Provides a deduplication and deletion mechanism Deduplication in Cloud Environment using File-Level and Block-Level
Techniques”, 2017 Imperial Journal of Interdisciplinary Research (IJIR)
along with a fragmentation and single replication for Vol-3, Issue-5, 2017 ISSN: 2454-1362.
cloud storage platform.
[3] A.Mounika and G. Murali, An Enhanced Approach for Securing
 Work on today’s cloud as an overlay. Authorized Deduplication in Hybrid Clouds”, 2016 International
Conference on Communication and Electronics Systems (ICCES).
[4] Ankush R. Deshmukh , Prof. R. V. Mante and Dr. P. N. Chatur, ”Cloud
V. CONCLUSION Based Deduplication and Self Data Destruction” 2017 International
Conference on Recent Trends in Electrical, Electronics and Computing
Huge measure of information is being transferred on Technologies.
cloud. This enables simple entry to the clients who can get to [5] M.Vanitha and Dr. C.Kavitha,”Secured Data Destruction in Cloud
their information living at any area at any point of time. Based Multi- Tenant Database Architecture”, 2014 Inter-national
However vast amount of duplicate data is being uploaded that Conference on Computer Communication and Informatics (ICCCI),
Coimbatore, India.
prompts a lessening in the storage room and increment in lease
of the distributed storage space. Secure deletion is also another [6] Arthur Rahumed, Henry C. H. Chen, Yang Tang, Patrick P. C. Lee, and
John C. S. Lui, ”A Secure Cloud Backup System with Assured Deletion
important aspect as as clients don't know whether their and Version Control”, 2011 International Conference on Parallel
information is erased forever from cloud or not. Alongside this Processing Workshops.
security of information being put away on cloud is a test. Our [7] Ashfia Binte Habib, Tasnim Khanam, Rajesh Palit, ”Simplified File
proposed system therefore aims at creating an application that Assured Deletion (SFADE) - A User Friendly Overlay Approach for
Data Security in Cloud Storage System”,2013 IEEE 978--4673-6217-
will perform fixed size block level deduplication check on the 7/13.
fragmented blocks and then encrypting the blocks before [8] https://www.investopedia.com/terms/c/cloud-computing.
uploading it to the cloud. Also, it performs policy based file [9] https://www.ibm.com/cloud/learn/what-is-cloud-computing.
assured deletion in order to securely delete file from the cloud.
As a result, it tries to achieve accuracy, security and other
design goals for the system.

978-1-5386-5873-4/18/$31.00 ©2018 IEEE 409

You might also like