Professional Documents
Culture Documents
Sathe 2018
Sathe 2018
Abstract—With the progression of web, an ever increasing redundant copies of data from being uploaded. Using
number of individuals have begun to outsource information on deduplication, it becomes easier to store a single copy of the
cloud. However, sending information to an outsider creates file and eliminate the multiple copies from being transferred
opportunity for attackers to easily retrieve data. Because of in the distributed storage framework.
simplicity, individuals regularly tend to store excess information
on the cloud. This leads to wastage and increase in the cost of
storage space.As nobody knows where the information is put In Single Block level File Deduplication, the complete file is
away on cloud; it is likewise vital to certainly erase information considered as a single block. A single fingerprint is made for
from cloud. Hence along with security, data de-duplication and the complete file and compared with the already stored
assured deletion are the major concerns in cloud storage. In this fingerprints. This makes more storage room as less number of
paper, we put-forth an application based on block level based file files are being put away and calculation time is lessened.
deduplication scheme over fragmented blocks that addresses the Fixed sized block-level deduplication divides a file into equal
issue of duplication in the cloud storage platform. We also focus sized chunks and compares the fingerprints of each chunk with
on guaranteed erasure using user defined policies. Thus on already stored chunks. This technique helps to find the number
expiry of a policy, the keys are deleted therefore making the file
of chunks that are duplicate in a file. But because of its settled
inaccessible to anyone including the proprietor.
size nature, it can't de-tect deduplication on a more extensive
Keywords—Block level deduplication Fragmentation, Policy scale due to content shifting problem. Consequently, content
assured deletion level deduplication is used in which which real substance of
document is checked.
I. INTRODUCTION
Cloud computing is defined as the conveyance of B. Data Deletion in Cloud
administrations by means of web through a cloud service
platform. Organizations and business firms are currently Data is replicated across several locations; whenever the
moving to distributed storage keeping in mind the end goal to proprietor wants to delete the data, he should be assured of
store the organization reports, introductions, business-basic data deletion over all areas in the cloud. The cloud specialist
information and reinforcement records. Scalability, co-ops and the clients have service level agreements (SLA) for
availability, elasticity, pay-per-use are a portion of the the same. Nonetheless, as client isn't sure whether his
advantages of distributed computing. Google Cloud, Amazon information is erased or not, guaranteed erasure is a basic
Web Services, Microsoft Azure, IBM Bluemix are a few who worry in distributed storage frameworks [10]. Policy based file
provide cloud computing services[9]. Because of its dispersed assured deletion ensures assured way of deletion when the
and framework less nature, clients don't know where their files associated with access policies are revoked.
information is put away on the cloud. Hence, cloud storage Our contributions are as follows:
suffer from security, integrity, confidentiality issues. It is Develop a cloud based application that stores file in
therefore required to secure the data before uploading to the fragmented and encrypted format on storage nodes.
cloud. As clients spare excess information on the cloud, the The main focus is to store only one fragment on a
lease of distributed storage increments. Consequently to limit particular storage node so that even if a node is
the cost factor, it is important to check the document before compromised, no information leakage is observed.
transferring for duplication. Attribute based file access is provided so that only
those users whose attributes match can have access to
the file.
A. Data deduplication in cloud
Data deduplication is a strategy by which the document is first
checked for a copy duplicate on the cloud. It helps to eliminate
Provide a block level deduplication scheme that works C. File Assured Deletion
on fixed blocks to check duplicate blocks on cloud Data deletion is an important issue in cloud. Even if the
using SHA-512 algorithm. user deletes the data, he is not assured of complete and secure
Provide a policy based assured deletion scheme to deletion of data. M. Vanitha and Dr. C.Kavitha[5] has
assuredly delete data from cloud based application. proposed a strategy to distinguish singular client's information
and their encryption enters keeping in mind the end goal to
II. LITERATURE REVIEW give an answer for erase information from cloud supplier's
multi-tenant storage architecture using a policy file which is
A. Fragmentation and Replication kept up by the key administrator. Subsequently, when the
policy gets revoked, the data files become unrecoverable.
Fragmentation deals with creating chunks of a file and then
storing it on cloud. Correspondingly replication makes in
Authors in [6] have presented a FadeVersion, secure cloud
excess of one duplicate of the document in the distributed
reinforcement framework. The framework does not store
storage for simple recovery at a later stage. Authors in [1]
excess information in different renditions of reinforcements. It
have proposed DROPS that arrangements with the security
gives security to information, rendition control of record and
issues in cloud. The data is divided and put in the storage
gives guaranteed erasure of documents. Diverse information
nodes. Each node is selected using centrality measure thus
objects are made for a document and encoded with an
stores only a single fragment. Consequently, regardless of
arrangement of information keys additionally scrambled with
whether a node is endangered; no imperative information is
set of control keys. A master key is used as it is easier to
uncovered.
maintain all the control keys. On the off chance that a strategy
B. Deduplication Techniques for document is disavowed, at that point the separate control
Deduplication on cloud can be performed at file- level, key is erased. If the data object is linked only with the revoked
block- level and content level. Authors in [2] has proposed a policy, at that point it will be guaranteed erased. However if
framework that spotlights on file and block level deduplication the data object is associated with both the revoked policy and
techniques. It gives an approach to discover copies by looking another active policy, at that point it can at present be gotten
at records utilizing MD5 Algorithm. It supports threshold through the active policy.
based deduplication check where a percentage value is set for
duplication check. If the duplication percentage is greater than Ashfia Binte Habib, Tasnim Khanam, Rajesh Palit[7] have
50%, then file is refrained from being transferred to the cloud. proposed a methodology which is an alternative and simpler
way for ensured erasure of documents by means of
Haonan Su, Dong Zheng, Yinghui Zhang in [3] have cryptographic algorithms and other tools without the help of a
proposed a scheme that realizes variable-size block-level key manager. Firstly, the user who wants to upload a file will
deduplication using Rabin fingerprinting. The client side first provide a secret phrase. A random key (key 2) will be
partitions the file F into a finite set of blocks Bi (where i = 1, 2 generated by which the document will be encoded. Secondly,
...). The user computes each block fingerprint f(Bi), and using the phrase another key (key 1) will be generated which
forwards the block fingerprints f(Bi) to the Cloud server for will further encrypt key 2. Finally, encrypted file and key 2
duplicate checks. f(Bi) acquired is sent to the Third Party who will be transferred to the cloud. Consequently erasure of key,
performs the randomization of the convergent keys. In the prompts guaranteed erasure of document.
event that copy square is discovered, at that point client is
informed for the same else document is transferred after III. PROPOSED SYSTEM
encryption.
We have proposed a cloud based application through
Ankush R. Deshmukh , Prof. R. V. Mante and Dr. P. N. which clients can store information on cloud. The application
Chatur [4] proposed a system that deals with document level performs duplication check before transferring the record on
deduplication and destruction in cloud using time variant the cloud. Block level based deduplication seems to be one of
encryption. Attribute based encryption(ABE) permits access the best and most useful technique used for deduplication
and decoding on a man's part/benefits inside an association, as check in cloud. Block level based technique allows better
opposed to by a man's particular personality. accuracy while finding redundant data in cloud. Unlike file
based technique; fixed size block level deduplication allows to
Authors in [8] have proposed a plagiarism system that uses check duplicates for redundant blocks of data. AES encryption
Rabin-Karp algorithm in search plagiarized content in is provided on the fragments before being uploaded to the
assignments submitted by students in college or university. It cloud. Attribute based encryption is used in order to allow
fastens the processing of string comparison by matching given only authorized users having a particular set of policies to
pattern with different i/p document’s string/substring by using access the file. Convergent key is used in order to share
hash esteems. It uses hash function for every string/substring ownership access of file. Also with respect to deletion, policy
in text. If hash estimation of given pattern and substring based file assured deletion is being used that works best on
matches, it implies two strings are comparable. user defined policies.