B 11-3

CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
Chunk-based de duplication is widely used in modern primary and backup, storage

systems to achieve high storage savings. It stores only a single physical copy of
duplicate chunks, while referencing all duplicate chunks to the physical copy by small-
size references. Prior studies show that de duplication can effectively reduce the storage
space of primary storage by 50% and that of backup storage by up to 98%. This
motivates the wide deployment of deduplication in various commercial cloud storage
services (e.g., Dropbox, Google Drive, Bitcasa, Mozy, and Memopal) to reduce
substantial storage costs.
To provide confidentiality guarantees, encrypted deduplication adds an encryption

layer to de duplication, such that each chunk, before being written to de duplicated
storage, is deterministically encrypted via symmetric-key encryption by a key derived
from the chunk content (e.g., the key is set to be the cryptographic hash of chunk
content). This ensures that duplicate chunks have identical content even after
encryption, and hence we can still apply de duplication to the encrypted chunks for
storage savings. Many studies have designed various encrypted de duplication schemes
to efficiently manage outsourced data in cloud storage.
In addition to storing non duplicate data, a de duplicated storage system needs to

keep de duplication metadata. There are two types of de duplication metadata. To check
if identical chunks exist, the system maintains a fingerprint index that tracks the
fingerprints of all chunks that have already been stored. Also, to allow a file to be
reconstructed, the system maintains a file recipe that holds the mappings from the
chunks in the file to the references of the corresponding physical copies.
De duplication metadata is notoriously known to incur high storage overhead

especially for the highly redundant workloads (e.g., backups) as the metadata storage
overhead becomes more dominant. In this work, we argue that encrypted de duplication
incurs even higher metadata storage overhead, as it additionally keeps key metadata,
such as the key recipes that track the chunk-to-key mappings to allow the decryption of
1
individual files. Since the key recipes contain sensitive key information, they need to
be managed separately from file recipes, encrypted by the master keys of file owners,
and individually stored for different file owners. Such high metadata storage overhead
can negate the storage effectiveness of encrypted de duplication in real deployment.
Contributions. To address the storage overhead of both de duplication metadata

and key metadata, we design and implement Metadedup, a new encrypted de
duplication system that effectively suppresses metadata storage. Our contributions are
summarized as follows.
Metadedup builds on the idea of indirection. Instead of directly storing all de

duplication and key metadata in both file and key recipes (both of which dominate the
metadata storage overhead), we group the metadata in the form of eta data chunks that
are stored in encrypted deduplication storage. Thus, both file and key recipes now store
references to metadata chunks, which now contain references to data chunks (i.e., the
chunks of file data). If Metadedup stores nearly identical files regularly (e.g., periodic
backups), the corresponding file and key metadata are expected to have long sequences
of references that are in the same order. This implies that the metadata chunks are highly
redundant and hence can be effectively de duplicated.
We propose a distributed key management approach that adapts Metadedup into a

multi-server architecture for fault tolerant data storage. We generate the key of each
data chunk from one of the servers, encode it via secret sharing and distribute the
resulting shares to remaining servers. This ensures fault-tolerant storage of data chunks,
while being robust against adversarial compromise on a number of servers through our
decoupled management of keys and shares.
We implement a Metadedup prototype and evaluate it performance in a networked

setup. Compared to the network speed of our Gigabit LAN test bed, we show that
Metadedup incurs only 13.09% and 3.06% of throughput loss in writing and restoring
files, respectively.
Finally, we conduct trace-driven simulation on two real world datasets. We show

that Metadedup achieves up to 93.94% of metadata storage savings in encrypted
deduplication. We also show that Metadedup maintains the storage load balance among
all servers.
2
1.2 MOTIVATION
Encrypted De duplication Storage De duplication is a technique for space-efficient

data storage (see [40] for a complete survey of de duplication). It partitions file data
into either fixed-size or variable-size chunks, and identifies each chunk by the
cryptographic hash, called fingerprint, of the corresponding content. Suppose that the
probability of fingerprint collision against different chunks is practically negligible
[10]. De duplication stores only one physical copy of duplicate chunks, and refers the
duplicate chunks that have the same fingerprint to the physical copy by small
references.
Encrypted de duplication augments plain de duplication (i.e., de duplication

without encryption) with an encryption layer that operates on the chunks before de
duplication, and provides data confidentiality guarantees in de duplicated storage. It
implements the encryption layer based on message locked encryption (MLE), which
encrypts each chunk with a symmetric key (called the MLE key) derived from the
chunk content; for example, the MLE key can be computed as the cryptographic hash
of the chunk content in convergent encryption. This ensures that the encrypted chunks
derived from duplicate chunks still have identical content, thereby being compatible
with de duplication.
Historical MLE builds on some publicly available function (e.g., cryptographic

hash function) to generate MLE keys. It provides security protection for unpredictable
chunks, meaning that the chunks are drawn from a sufficiently large message set, such
that the content of a chunk cannot be easily predicted; otherwise, if a chunk is
predictable and known to be drawn from a finite set, historical MLE is vulnerable to
the offline brute-force attack. Specifically, given a target encrypted chunk, an adversary
samples each possible chunk from the finite message set, derives the corresponding
MLE key (e.g., by applying the cryptographic hash function to each sampled chunk),
and encrypts each sampled chunk with such a key. If the encryption result is equal to
the target encrypted chunk, the adversary can infer that the sampled chunk is the
original input of the target encrypted chunk.
To defend against the offline brute-force attack, Dup LESS implements server-
aided MLE, which introduces a global secret and protects the key generation process
3
against public access. Server-aided MLE leverages a dedicated key manager to
maintain a global secret that is protected from an adversary. It derives the MLE key of
each chunk based on both the global secret and the cryptographic hash (a.k.a.,
fingerprint) of the chunk, such that an adversary cannot feasibly derive the MLE keys
of any sampled chunks without knowing the global secret. Thus, if the global secret is
secure, server aided MLE is robust against the offline brute-force attack, and achieves
security for both predictable and unpredictable chunks; otherwise, if the global secret
is compromised, it preserves the same security guarantees for unpredictable chunks as
in historical MLE. Dup LESS further implements two mechanisms to strengthen
security: (i) the oblivious pseudorandom function (OPRF) protocol, which allows the
client to submit the fingerprint of each chunk (to the key manager) in a blinded manner,
such that the key manager can successfully generate MLE keys without learning the
actual fingerprints; and (ii) the rate-limiting mechanism, which proactively controls the
key generation rate of the key manager, so as to defend the online brute-force attack
from the compromised clients that try to issue too many key generation requests.
In this project, we focus on mitigating the metadata storage overhead in MLE-

based (including both historical MLE and server-aided MLE) encrypted deduplication
storage systems.
Mathematical analysis. We first model the metadata storage overhead in

encrypted de duplication. We refer to the data before and after de duplication as logical
data and physical data, respectively. Suppose that L is the size of logical data, P is the
size of physical data, and f is the ratio of the de duplication metadata size to the chunk
size. Plain de duplication incurs f_(L+P) of metadata storage [38], [39], where f _L is
the size of file recipes and f_P is the size of the fingerprint index. Encrypted de
duplication has additional metadata storage for keys. It incurs a total of f _ (L + P) + k
_ L of metadata storage, where k is the ratio of the key metadata size to the chunk size,
and k _ L is the size of key recipes.
Based on the above analysis, we show via an example how the metadata storage
overhead becomes problematic in encrypted de duplication. Suppose that the size of de
duplication metadata is 30 bytes [39], the size of key metadata is 32 bytes (e.g., for
AES-256 encryption keys), and the chunk size is 8KB [39]. Then f = 30 bytes 8KB _
0.0037 and k = 32 bytes 8KB _ 0.0039. If the de duplication factor (i.e., L=P) is 50_
4
[39] and L = 50 TB, then plain de duplication and encrypted de duplication incur
191.25GB and 391.25GB of metadata, or equivalently 18.67% and 38.21% additional
storage for 1 TB of physical data, respectively.
1.3 PROBLEM STATEMENT
In existing system, it stores only single copy of duplicate chunks, while referencing
all duplicate chunks to the physical copy which reduces primary storage by 50% and
backup storage by 98%. So, we have proposed a encrypted deduplication which adds
an encryption layer to deduplication such that each chunk before being written to
deduplicated storage is encrypted via symmetric key encryption by a key derived from
the chunk content.
1.4 EXISTING SYSTEM
Chunk based deduplication is widely used in modern primary and backup storage
system to achieve high storage savings. It stores only a single physical copy of duplicate
chunks, while referencing all duplicate chunks to the physical copy by small size
references. Prior studies show that deduplication can effectively reduce the storage
space of primary storage by 50% and that of backup storage by up to 98%. To provide
confidentiality guarantees, encrypted deduplication adds an encryption layer to
deduplication, such that each chunk, before being written to deduplicated storage, is
deterministically encrypted via symmetric key encryption by a derived from the chunk
content (Ex: the key is set to be cryptographic hash of chunk content). This ensures that
duplicate chunks have identical content even after encryption, and hence we can still
apply deduplication to the encrypted chunks for storage savings. Many studies have
designed various encrypted deduplication schemes to efficiently manage out sourced
data in cloud storage. In addition to storing non duplicate data, a deduplicated storage
system needs to keep deduplication metadata.
Disadvantages:
• Metadedup is not designed for an organization that outsources the storage of users’
data to a remote shared storage system.
• There is less efficiency on the cloud data due to High storage overhead of metadata.
5
1.5 PROPOSED SYSTEM
Metadedup builds on the idea of indirection. Instead of directly storing al

deduplication and key metadata in both file and key recipes (both of which dominate
the metadata storage overhead), we group the metadata in the form of metadata chunks
that are stored in encrypted deduplication storage. Thus, both file and key recipes now
store references to metadata chunks, which now contain references to data chunks (i.e.,
the chunks of file data). If Metadedup stores nearly identical files regularly (e.g.,
periodic backups), the corresponding file and key metadata are expected to have long
sequences of references that are in the same order. This implies that the metadata
chunks are highly redundant and hence can be effectively deduplicated.
The system proposes a distributed key management approach that adapts

Metadedup into a multi-server architecture for fault tolerant data storage. We generate
the key of each data chunk from one of the servers, encode it via secret sharing, and
distribute the resulting shares to remaining servers. This ensures fault-tolerant storage
of data chunks, while being robust against adversarial compromise on a number of
servers through our decoupled management of keys and shares.
The system implements a Metadedup prototype and evaluate its performance in a

networked setup. Compared to the network speed of our Gigabit LAN testbed, we show
that Metadedup incurs only 13.09% and 3.06% of throughput loss in writing and
restoring files, respectively.
Finally, the system conducts trace-driven simulation on two real world datasets.
We show that Metadedup achieves up to 93.94% of metadata storage savings in
encrypted deduplication. We also show that Metadedup maintains the storage load
balance among all servers.
Advantages:
• The system ensures that our Metadedup design is compatible with existing
countermeasures that address different threats against encrypted deduplication
systems.
• A malicious client may abuse client-side deduplication to learn whether other users
have already stored certain files, and Metadedup can defeat the side channel leakage
6
by adopting the server-side deduplication on cross-user data. A malicious server
may modify or even delete stored files to destroy the availability of outsourced files,
and Metadedup is compatible with the availability countermeasure that disperses
data across servers via deduplication-aware secret sharing.
7
CHAPTER 2
LITERATURE SURVEY
2.1 LITERATURE SURVEY
1. Y. Zhou, D. Feng, W. Xia, M. Fu, F. Huang, Y. Zhang, and C. Li. SecDep: A

user-aware efficient fine-grained secure deduplication scheme with multi-level key
management. In Proc. of IEEE MSST, 2015
Nowadays, many customers and enterprises backup their data to cloud storage that
performs deduplication to save storage space and network bandwidth. Hence, how to
perform secure deduplication becomes a critical challenge for cloud storage. According
to our analysis, the state-of-the-art secure deduplication methods are not suitable for
cross-user finegrained data deduplication. They either suffer brute-force attacks that
can recover files falling into a known set, or incur large computation (time) overheads.
Moreover, existing approaches of convergent key management incur large space
overheads because of the huge number of chunks shared among users. Our observation
that cross-user redundant data are mainly from the duplicate files, motivates us to
propose an efficient secure deduplication scheme SecDep. SecDep employs User-
Aware Convergent Encryption (UACE) and Multi-Level Key management (MLK)
approaches.
2. W. Xia, H. Jiang, D. Feng, F. Douglis, P. Shilane, Y. Hua, M. Fu, Y. Zhang, and

Y. Zhou. A comprehensive study of the past, present, and future of data
deduplication. Proceedings of the IEEE, 104(9):1681–1710, 2016
Data deduplication, an efficient approach to data reduction, has gained increasing

attention and popularity in large-scale storage systems due to the explosive growth of
digital data. It eliminates redundant data at the file or subfile level and identifies
duplicate content by its cryptographically secure hash signature (i.e., collision-resistant
fingerprint), which is shown to be much more computationally efficient than the
traditional compression approaches in large-scale storage systems. In this paper, we
first review the background and key features of data deduplication, then summarize and
classify the state-of-the-art research in data deduplication according to the key
workflow of the data deduplication process. The summary and taxonomy of the state
8
of the art on deduplication help identify and understand the most important design
considerations for data deduplication systems. In addition, we discuss the main
applications and industry trend of data deduplication, and provide a list of the publicly
available sources for deduplication research and studies. Finally, we outline the open
problems and future research directions facing deduplication-based storage systems.
3. J. Li, P. P. C. Lee, Y. Ren, and X. Zhang. Metadedup: Deduplicating metadata in

encrypted deduplication via indirection. In Proc. of IEEE MSST, 2019.
Encrypted deduplication combines encryption and deduplication in a seamless way to

provide confidentiality guarantees for the physical data in deduplication storage, yet it
incurs substantial metadata storage overhead due to the additional storage of keys. We
present a new encrypted deduplication storage system called Metadedup, which
suppresses metadata storage by also applying deduplication to metadata. Its idea builds
on indirection, which adds another level of metadata chunks that record metadata
information. We find that metadata chunks are highly redundant in real-world
workloads and hence can be effectively deduplicated. In addition, metadata chunks can
be protected under the same encrypted deduplication framework, thereby providing
confidentiality guarantees for metadata as well. We evaluate Metadedup through
microbenchmarks, prototype experiments, and trace-driven simulation. Metadedup has
limited computational overhead in metadata processing, and only adds 6.19% of
performance overhead on average when storing files in a networked setting. Also, for
real-world backup workloads, Metadedup saves the metadata storage by up to 97.46%
at the expense of only up to 1.07% of indexing overhead for metadata chunks.
4. J. Li, X. Chen, M. Li, J. Li, P. P. C. Lee, and W. Lou. Secure deduplication with
efficient and reliable convergent key management. IEEE Transactions on Parallel
Distributed Systems, 25(6):1615–1625, 2014.
Data deduplication is a technique for eliminating duplicate copies of data, and has been
widely used in cloud storage to reduce storage space and upload bandwidth. Promising
as it is, an arising challenge is to perform secure deduplication in cloud storage.
Although convergent encryption has been extensively adopted for secure deduplication,
a critical issue of making convergent encryption practical is to efficiently and reliably
manage a huge number of convergent keys. This paper makes the first attempt to
9
formally address the problem of achieving efficient and reliable key management in
secure deduplication. We first introduce a baseline approach in which each user holds
an independent master key for encrypting the convergent keys and outsourcing them to
the cloud. However, such a baseline key management scheme generates an enormous
number of keys with the increasing number of users and requires users to dedicatedly
protect the master keys. To this end, we propose Dekey , a new construction in which
users do not need to manage any keys on their own but instead securely distribute the
convergent key shares across multiple servers. Security analysis demonstrates that
Dekey is secure in terms of the definitions specified in the proposed security model. As
a proof of concept, we implement Dekey using the Ramp secret sharing scheme and
demonstrate that Dekey incurs limited overhead in realistic environments.
5. Y. Duan. Distributed key generation for encrypted deduplication: Achieving the

strongest privacy. In Proc. of ACM CCSW, 2014.
Large-scale storage systems often attempt to achieve two seemingly conflicting goals:
the systems need to reduce the copies of redundant data to save space, a process called
deduplication; and users demand encryption of their data to ensure privacy.
Conventional encryption makes deduplication on ciphertexts ineffective, as it destroys
data redundancy. A line of work, originated from Convergent Encryption, and evolved
into Message Locked Encryption, strives to solve this problem. The latest work,
DupLESS, proposes a server-aided architecture that provides the strongest privacy. The
DupLESS architecture relies on a key server to help the clients generate encryption
keys that result in convergent ciphertexts. In this paper, we first provide a rigorous proof
of security, in the random oracle model, for the DupLESS architecture which is lacking
in the original paper. Our proof shows that using additional secret, other than the data
itself, for generating encryption keys achieves the best possible security under current
deduplication paradigm. We then introduce a distributed protocol that eliminates the
need for a key server and allows less managed systems such as P2P systems to enjoy
the high security level. Implementation and evaluation show that the scheme is both
robust and practical.
10
2.2 SOFTWARE AND HARDWARE REQUIREMENTS
2.2.1 SOFTWARE REQUIREMENTS
➢ Operating System - Windows XP
➢ Coding Language - Java/J2EE(JSP,Servlet)
➢ Front End - J2EE
➢ Back End - MySQL
2.2.2 HARDWARE REQUIEMENTS
➢ Processor - Pentium –IV
➢ RAM - 4 GB (min)
➢ Hard Disk - 20 GB
➢ Key Board - Standard Windows Keyboard
➢ Mouse - Two or Three Button Mouse
➢ Monitor - SVGA
11
CHAPTER 3
METHODOLOGY AND DESIGN ANALYSIS
3.1 SYSTEM ARCHITECTURE
Figure 3.1: Metadedup architecture
Here the data is divided into chunks and the chunks are highly redundant in real world
so they need to be deduplicated. And for every chunk a secret key is assigned. And
dedup algorithm will deduplicate the chunks. And the file recipes and key recipes will
be used for the reconstruction of the back-up files.
Metadedup is designed for an organization that outsources the storage of users’ data to
a remote shared storage system. It focuses on the storage of backup workloads, which
are known to have high content similarity. It applies deduplication to remove content
redundancies of both data (i.e., the file data from users’ backup workloads) and
metadata (i.e., deduplication metadata and key metadata), so as to improve the overall
storage efficiency. Metadedup builds on the client-server model. A client is a software
interface for users to process backup files. It partitions a backup file into multiple
chunks, encrypts each chunk, and uploads the encrypted chunks to a remote server that
employs encrypted deduplication. The server performs chunk-based deduplication, and
only stores the non-duplicate chunks that have unique content with existing chunks. It
also keeps the file recipe and key recipe of each backup file (note that the key recipe is
further encrypted by the master key of the client that owns the backup file) for the
reconstruction of the backup file. Here, we assume that the communication channels
are carefully protected (e.g., via SSL/TLS), so as to address network eavesdropping. To
this end, Metadedup aims for the following goals:
12
➢ Low storage overhead of metadata: It suppresses the storage space of both file
recipes and key recipes, while incurring only small storage overhead to the
fingerprint index as we also apply deduplication to metadata.
➢ Security for data and metadata: It preserves the security guarantees of underlying
encrypted deduplication for both data and metadata storage.
➢ Limited performance overhead: It adds small performance overhead on writing
(restoring) files to (from) deduplicated storage. In the following, we define the
threat model of Metadedup, and present its metadata deduplication design in details.
We also present the security analysis of Metadedup on the protection of metadata
chunks in Section 1 of the digital supplementary file.
13
3.2 ALGORTHMS
Algorithm 1: Write operation
Client input: target file f ile, client’s master key key
Initialize file and key recipes: fRecipe, kRecipe
Divide f ile into data chunks {dChunk}
for each dChunk do
dkey ← hash of dChunk
[dChunk]dkey ← encryption of dChunk with dkey
Divide {[dChunk]dkey} into segments {Seg}
for each Seg do
Initialize metadata chunk mChunk
for each [dChunk]dkey in Seg do
Add metadata of [dChunk]dkey into mChunk
mkey ← cryptographic hash of mChunk
[mChunk]mkey ← encryption of mChunk with mkey
Add deduplication metadata of [mChunk]mkey into fRecipe
Add mkey into kRecipe
[kRecipe]key ← encryption of kRecipe with key
Upload: fRecipe, [kRecipe]key, {[dChunk]dkey}, {[mChunk]mkey} Server input:

fingerprint index Receive: fRecipe, [kRecipe]key, {[dChunk]dkey}, {[mChunk]mkey}
Deduplicate {[dChunk]dkey} and {[mChunk]mkey}
Store unique {[dChunk]dkey} and {[mChunk]mkey}
Store fRecipe and [kRecipe]
14
Algorithm 2: Restore operation
Client input: full pathname name of the target file
Request metadata based on name Server input: fRecipe, {[mChunk]mkey} and

[kRecipe]key
Receive name
Retrieve fRecipe and [kRecipe]key based on name
Retrieve {[mChunk]mkey} based on fRecipe
Send fRecipe, {[mChunk]mkey} and [kRecipe]key Client input: client’s master key
key
Receive fRecipe, {[mChunk]mkey} and [kRecipe]key
kRecipe ← decryption of [kRecipe]key with key
for each [mChunk]mkey do
mkey ← corresponding key in kRecipe
mChunk ← decryption of [mChunk]mkey with mkey
Request data chunks using deduplication metadata in {mChunk} Server input:

{[dChunk]dkey}
Receive deduplication metadata of data chunks
Retrieve {[dChunk]dkey} based on deduplication metadata
Send {[dChunk]dkey} Client input: {mChunk}
Receive {[dChunk]dkey}
for each [dChunk]dkey do
Retrieve corresponding dkey in {mChunk}
dChunk ← decryption of [dChunk]dkey with dkey
Assemble {dChunk} to original file
15
3.3 MODULES:
There are 5 modules are there in our project. They are:
1. Data Owner
2. Cloud Server
3. End User
4. Data Encryption and Decryption
5. Attacker
1.Data Owner Module
In this module, the data owner uploads their data in the cloud server. For the security
purpose the data owner encrypts the data file’s blocks and then store in the cloud. The
data owner can check the duplication of the file’s blocks over Corresponding cloud
server. The Data owner can have capable of manipulating the encrypted data file’s
blocks and the data owner can check the cloud data as well as the duplication of the
specific file’s blocks and also he can create remote user with respect to registered cloud
servers. The data owner also checks data integrity proof on which the block is modified
by the attacker.
2. Cloud Server Module
The cloud service provider manages a cloud to provide data storage service. Data
owners encrypt their data file’s blocks and store them in the cloud for sharing with
Remote User. To access the shared data file’s blocks, data consumers download
encrypted data file’s blocks of their interest from the cloud and then decrypt them.
3. End User
In this module, remote user logs in by using his username and password. After he will
request for secrete key of required file’s blocks from cloud servers, and get the secrete
key. After getting secrete key he is trying to download file’s blocks by entering file’s
blocks name and secrete key from cloud server.
16
4. Data Encryption and Decryption
All the legal users in the system can freely query any interested encrypted and decrypted
data. Upon receiving the data from the server, the user runs the decryption algorithm
Decrypt to decrypt the cipher text by using its secret keys from different Users. Only
the attributes the user possesses satisfy the access structure defined in the cipher text
CT, the user can get the content key.
5.Attacker Module
The user who attacks or modifies the block content called attacker. The attacker may
the user who tries to access the file contents by wrong secret key from the cloud server.
17
3.4 SYSTEM DESIGN
INPUT DESIGN:
Input Design plays a vital role in the life cycle of software development, it requires very
careful attention of developers. The input design is to feed data to the application as
accurate as possible. So inputs are supposed to be designed effectively so that the errors
occurring while feeding are minimized. According to Software Engineering Concepts,
the input forms or screens are designed to provide to have a validation control over the
input limit, range and other related validations.
This system has input screens in almost all the modules. Error messages are developed
to alert the user whenever he commits some mistakes and guides him in the right way
so that invalid entries are not made. Let us see deeply about this under module design.
Input design is the process of converting the user created input into a computer-based
format. The goal of the input design is to make the data entry logical and free from
errors. The error is in the input are controlled by the input design. The application has
been developed in user-friendly manner. The forms have been designed in such a way
during the processing the cursor is placed in the position where must be entered. The
user is also provided with in an option to select an appropriate input from various
alternatives related to the field in certain cases.
Validations are required for each data entered. Whenever a user enters an erroneous
data, error message is displayed and the user can move on to the subsequent pages after
completing all the entries in the current page.
OUTPUT DESIGN:
The Output from the computer is required to mainly create an efficient method of
communication within the company primarily among the project leader and his team
members, in other words, the administrator and the clients. The output of VPN is the
system which allows the project leader to manage his clients in terms of creating new
clients and assigning new projects to them, maintaining a record of the project validity
and providing folder level access to each client on the user side depending on the
projects allotted to him. After completion of a project, a new project may be assigned
to the client. User authentication procedures are maintained at the initial stages itself.
18
A new user may be created by the administrator himself or a user can himself register
as a new user but the task of assigning projects and validating a new user rests with the
administrator only.
The application starts running when it is executed for the first time. The server has to
be started and then the internet explorer in used as the browser. The project will run on
the local area network so the server machine will serve as the administrator while the
other connected systems can act as the clients. The developed system is highly user
friendly and can be easily understood by anyone using it even for the first time.
3.4.1 UML DIAGRAMS:
UML stands for Unified Modelling Language. UML is a standardized general purpose
modelling language in the field of object-oriented software engineering. The standard
is managed, and was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object-
oriented computer software. In its current form UML is comprised of two major
components: a Meta-model and a notation. In the future, some form of method or
process may also be added to; or associated with, UML.
The Unified Modelling Language is a standard language for specifying, Visualization,

Constructing and documenting the artifacts of software system, as well as for business
modelling and other non-software systems.
The UML represents a collection of best engineering practices that have proven
successful in the modelling of large and complex systems.
The UML is a very important part of developing object-oriented software and the
software development process. The UML uses mostly graphical notations to express
the design of software projects.
GOALS: The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modelling Language so that they

can develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
19
4. Provide a formal basis for understanding the modelling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks,
patterns and components.
7. Integrate best practices.
3.4.1.1 Class Diagram
The UML Class diagram is a graphical notation used to construct and visualize object-
oriented systems. A class diagram in the Unified Modelling Language (UML) is a type
of static structure diagram that describes the structure of a system by showing the
system's:
• classes,
• their attributes,
• operations (or methods),
• the relationships among objects.
20
Figure 3.2: Class diagram
The figure represents the class diagram of the project in this we can see cloud server,
data owner, end user, attacker.
The data owner, the owner initially register the account by filling details the list and
after completion of register he can easily login to the account and after login the owner
he may upload the data in the cloud server for the security. Data owner may upload the
data in form of blocks so he can easily encrypt the data.
The cloud server also register and login to the account by filling the details.
3.4.1.2 Use Case Diagram
A use case diagram in the Unified Modelling Language (UML) is a type of behavioural
diagram defined by and created from a Use-case analysis. Its purpose is to present a
graphical overview of the functionality provided by a system in terms of actors, their
goals (represented as use cases), and any dependencies between those use cases. The
main purpose of a use case diagram is to show what system functions are performed for
which actor. Roles of the actors in the system can be depicted.
Figure 3.3: Use Case diagram for Data Owner and Cloud Server
21
The use case diagram represents that data owner must and register and login and browse
files split into the blocks and then encryption and upload he may check for data
deduplication and data integrity. If the data is duplicate data, then it displays
deduplication found.
Figure 3.4: Use case diagram for End user and Cloud Server
This is use case diagram for end user and cloud server. The end user and cloud sever
has to login. Then end user will request for the file which he want. Then the cloud server
will view the file requests and respond for the requests. That means cloud server will
give permission to access the files. Then the end user can able to download the files.
3.4.1.3 Sequence Diagram
A sequence diagram in Unified Modelling Language (UML) is a kind of interaction

diagram that shows how processes operate with one another and in what order. It is a
construct of a Message Sequence Chart. Sequence diagrams are sometimes called event
diagrams, event scenarios, and timing diagrams.
22
Figure 3.5: Sequence Diagram
First the data owner will register and login. Data owner will upload the file, then the
cloud server will divide the uploaded file into blocks and for each block server will give
a mac and the file is encrypted. The end user will request for the file then the server
will respond for the file request. The end user will be able to download the file.
3.4.2 Dataflow Diagram:
1. The DFD is also called as bubble chart. It is a simple graphical formalism that can
be used to represent a system in terms of input data to the system, various processing
carried out on this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used
to model the system components. These components are the system process, the
data used by the process, an external entity that interacts with the system and the
information flows in the system.
3. DFD shows how the information moves through the system and how it is modified
by a series of transformations. It is a graphical technique that depicts information
flow and the transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at
any level of abstraction. DFD may be partitioned into levels that represent
increasing information flow and functional detail.
23
Figure 3.6: Data Flow Diagram for Level 0
This is Level 0 DFD. First the data owner has to get register and login. Then he will
upload the file. If the data owner wants to update the file then he will able to update the
file.
24
Figure 3.7: Data Flow Diagram for Level 1
Data owner has to get register. After registration he has to login. The data owner will
upload the file then the file sent to cloud server. Whenever the user want the file, then
end user has to send the request for the file then the cloud server will respond for the
file request. Cloud server will give permission to access the file. Then the end user will
be able to download the file.
3.4.3 Flowchart Diagram:
Flowcharts are nothing but the graphical representation of the data or the algorithm
for a better understanding of the code visually. It displays step-by-step solutions to a
problem, algorithm, or process. It is a pictorial way of representing steps that are
preferred by most beginner-level programmers to understand algorithms of computer
science, thus it contributes to troubleshooting the issues in the algorithm. A flowchart
is a picture of boxes that indicates the process flow in a sequential manner. Since a
flowchart is a pictorial representation of a process or algorithm, it’s easy to interpret
and understand the process. To draw a flowchart, certain rules need to be followed
25
which are followed by all professionals to draw a flowchart and is widely accepted
all over the countries.
Use of a flowchart
Following are the uses of a flowchart:
• It is a pictorial representation of an algorithm that increases the readability of the

program.
• Complex programs can be drawn in a simple way using a flowchart.
• It helps team members get an insight into the process and use this knowledge to
collect data, detect problems, develop software, etc.
• A flowchart is a basic step for designing a new process or add extra features.
• Communication with other people becomes easy by drawing flowcharts and
sharing them.
When to use flowchart
Flowcharts are mainly used in the below scenarios:
• It is most importantly used when the programmers make projects. As a flowchart

is a basic step to make the design of projects pictorially, it is preferred by many.
• When the flowcharts of a process are drawn, the programmer understands the non-
useful parts of the process. So flowcharts are used to separate useful logic from
the unwanted parts.
• Since the rules and procedure of drawing a flowchart are universal, flowchart
serves as a communication channel to the people who are working on the same
project for better understanding.
• Optimizing a process becomes easier with flowcharts. The efficiency of code is
improved with the flowchart drawing.
Types of Flowcharts
Three types of flowcharts are listed below:
1. Process flowchart: This type of flowchart shows all the activities that are
involved in making a product. It basically provides a pathway to analyse the
product to be built. A process flowchart is most commonly used in process
26
engineering to illustrate the relation between the major as well as minor
components present in the product. It is used in business product modelling to
help understand employees about the project requirements and gain some insight
about the project.
2. Data flowchart: As the name suggests, the data flowchart is used to analyse the
data, specifically it helps in analysing the structural details related to the project.
Using this flowchart, one can easily understand the data inflow and outflow from
the system. It is most commonly used to manage data or to analyse information to
and from the system.
3. Business Process Modelling Diagram: Using this flowchart or diagram, one can
analytically represent the business process and help simplify the concepts needed
to understand business activities and the flow of information. This flowchart
illustrates the business process and models graphically which paves a way for
process improvement.
Most flowcharts use only a simplified set of symbols with a single type of relationship,
a solid line with an arrow. Diagrams should begin and ideally end with the terminator
symbol. However, this rule is not as strict as it was in the UML Activity diagram. The
terminator is usually drawn as a rectangle with distinctly round corners. In some
diagrams, however, you may also encounter an ellipse or circle. The terminator where
the process flow starts usually contains the "Start" text and the one where the process
ends the "End" text. Some diagrams have multiple end points and some none.
Advantages of Flowchart
• It is the most efficient way of communicating the logic of system.

• It act like a guide for blueprint during program designed.
• It also helps in debugging process.
• Using flowchart we can easily analyse the programs.
• flowcharts are good for documentation.
Disadvantages of Flowchart
• Flowcharts are difficult to draw for large and complex programs.

• It does not contain the proper amount of details.
• Flowcharts are very difficult to reproduce.
• Flowcharts are very difficult to modify.
27
Figure 3.8: Flow chart for Data owner
In the above flow chart, it shows the sequence of steps performed in the project. Data
owner and user has to login to the cloud to perform the operations.
3.4.4 ER Diagram
ER Diagram stands for Entity Relationship Diagram, also known as ERD is a diagram
that displays the relationship of entity sets stored in a database. In other words, ER
diagrams help to explain the logical structure of databases. ER diagrams are created
based on three basic concepts: entities, attributes and relationships.
28
ER Diagrams contain different symbols that use rectangles to represent entities, ovals
to define attributes and diamond shapes to represent relationships.
an ER diagram looks very similar to the flowchart. However, ER Diagram includes

many specialized symbols, and its meanings make this model unique. The purpose of
ER Diagram is to represent the entity framework infrastructure.
ER Model
ER Model stands for Entity Relationship Model is a high-level conceptual data model
diagram. ER model helps to systematically analyze data requirements to produce a well-
designed database. The ER Model represents real-world entities and the relationships
between them. Creating an ER Model in DBMS is considered as a best practice before
implementing your database.
ER Modeling helps you to analyze data requirements systematically to produce a well-
designed database. So, it is considered a best practice to complete ER modeling before
implementing your database.
Advantages of E-R Diagram:
➢ Helps you to define terms related to entity relationship modeling

➢ Provide a preview of how all your tables should connect, what fields are going to
be on each table
➢ Helps to describe entities, attributes, relationships
➢ ER diagrams are translatable into relational tables which allows you to build
databases quickly
➢ ER diagrams can be used by database designers as a blueprint for implementing
data in specific software applications
➢ The database designer gains a better understanding of the information to be
contained in the database with the help of ERP diagram
➢ ER Diagram allows you to communicate with the logical structure of the database
to users
29
Figure 3.9: E-R Diagram
30
CHAPTER 4
IMPLEMENTATION
4.1 INTRODUCTION
Cloud Computing is the delivery of computing services such as servers, storage,

databases, networking, software, analytics, intelligence, and more, over the Cloud
(Internet).
Cloud Computing provides an alternative to the on-premises datacentre. With an on-

premises datacentre, we have to manage everything, such as purchasing and installing
hardware, virtualization, installing the operating system, and any other required
applications, setting up the network, configuring the firewall, and setting up storage for
data. After doing all the set-up, we become responsible for maintaining it through its
entire lifecycle.
But if we choose Cloud Computing, a cloud vendor is responsible for the hardware
purchase and maintenance. They also provide a wide variety of software and platform
as a service. We can take any required services on rent. The cloud computing services
will be charged based on usage.
The cloud environment provides an easily accessible online portal that makes handy for
the user to manage the compute, storage, network, and application resources.
Advantages of cloud computing

• Cost: It reduces the huge capital costs of buying hardware and software.
• Speed: Resources can be accessed in minutes, typically within a few clicks.
• Scalability: We can increase or decrease the requirement of resources according to
the business requirements.
• Productivity: While using cloud computing, we put less operational effort. We do
not need to apply patching, as well as no need to maintain hardware and software.
So, in this way, the IT team can be more productive and focus on achieving business
goals.
• Reliability: Backup and recovery of data are less expensive and very fast for
business continuity.
31
• Security: Many cloud vendors offer a broad set of policies, technologies, and
controls that strengthen our data security.
Types of Cloud Computing
• Public Cloud: The cloud resources that are owned and operated by a third-party
cloud service provider are termed as public clouds. It delivers computing resources
such as servers, software, and storage over the internet
• Private Cloud: The cloud computing resources that are exclusively used inside a
single business or organization are termed as a private cloud. A private cloud may
physically be located on the company’s on-site datacentre or hosted by a third-party
service provider.
• Hybrid Cloud: It is the combination of public and private clouds, which is bounded
together by technology that allows data applications to be shared between them.
Hybrid cloud provides flexibility and more deployment options to the business.
Types of Cloud Services

1. Infrastructure as a Service (IaaS): In IaaS, we can rent IT infrastructures like
servers and virtual machines (VMs), storage, networks, operating systems from a
cloud service vendor. We can create VM running Windows or Linux and install
anything we want on it. Using IaaS, we don’t need to care about the hardware or
virtualization software, but other than that, we do have to manage everything else.
Using IaaS, we get maximum flexibility, but still, we need to put more effort into
maintenance.
2. Platform as a Service (PaaS): This service provides an on-demand environment

for developing, testing, delivering, and managing software applications. The
developer is responsible for the application, and the PaaS vendor provides the
ability to deploy and run it. Using PaaS, the flexibility gets reduce, but the
management of the environment is taken care of by the cloud vendors.
3. Software as a Service (SaaS): It provides a centrally hosted and managed software

services to the end-users. It delivers software over the internet, on-demand, and
typically on a subscription basis. E.g., Microsoft One Drive, Dropbox, WordPress,
Office 365, and Amazon Kindle. SaaS is used to minimize the operational cost to
the maximum extent.
32
4.2 SOFTWARE ENVIRONMENT
Client server:
Overview:
With the varied topic in existence in the fields of computers, Client Server is one, which
has generated more heat than light, and also more hype than reality. This technology
has acquired a certain critical mass attention with its dedication conferences and
magazines. Major computer vendors such as IBM and DEC, have declared that Client
Servers is their main future market. A survey of DBMS magazine revealed that 76% of
its readers were actively looking at the client server solution. The growth in the client
server development tools from $200 million in 1992 to more than $1.2 billion in 1996.
Client server implementations are complex but the underlying concept is simple and
powerful. A client is an application running with local resources but able to request the
database and relate the services from separate remote server. The software mediating
this client server interaction is often referred to as MIDDLEWARE.
The typical client either a PC or a Work Station connected through a network to a more
powerful PC, Workstation, Midrange or Main Frames server usually capable of
handling request from more than one client. However, with some configuration server
may also act as client. A server may need to access other server in order to process the
original client request.
The key client server idea is that client as user is essentially insulated from the physical
location and formats of the data needs for their application. With the proper
middleware, a client input from or report can transparently access and manipulate both
local database on the client machine and remote databases on one or more servers. An
added bonus is the client server opens the door to multi-vendor database access
indulging heterogeneous table joins.
What is Client Server:
Two prominent systems in existence are client server and file server systems. It is
essential to distinguish between client servers and file server systems. Both provide
shared network access to data but the comparison dens there! The file server simply
provides a remote disk drive that can be accessed by LAN applications on a file by file
33
basis. The client server offers full relational database services such as SQL-Access,
Record modifying, Insert, Delete with full relational integrity backup/ restore
performance for high volume of transactions, etc. the client server middleware provides
a flexible interface between client and server, who does what, when and to whom.
Why Client Server:
Client server has evolved to solve a problem that has been around since the earliest days
of computing: how best to distribute your computing, data generation and data storage
resources in order to obtain efficient, cost effective departmental an enterprise wide
data processing. During mainframe era choices were quite limited. A central machine
housed both the CPU and DATA (cards, tapes, drums and later disks). Access to these
resources was initially confined to batched runs that produced departmental reports at
the appropriate intervals. A strong central information service department ruled the
corporation. The role of the rest of the corporation limited to requesting new or more
frequent reports and to provide hand written forms from which the central data banks
were created and updated. The earliest client server solutions therefore could best be
characterized as “SLAVE-MASTER”.
Front End or User Interface Design :
The entire user interface is planned to be developed in browser specific environment

with a touch of Intranet-Based Architecture for achieving the Distributed Concept.The
browser specific components are designed by using the HTML standards, and the
dynamism of the designed by concentrating on the constructs of the Java Server Pages.
Communication or Database Connectivity Tier :
The Communication architecture is designed by concentrating on the Standards of

Servlets and Enterprise Java Beans. The database connectivity is established by using
the Java Data Base Connectivity. The standards of three-tire architecture are given
major concentration to keep the standards of higher cohesion and limited coupling for
effectiveness of the operations.
Features of The Language Used:
In my project, I have chosen Java language for developing the code.
34
About Java:
Initially the language was called as “oak” but it was renamed as “Java” in 1995. The
primary motivation of this language was the need for a platform-independent (i.e.,
architecture neutral) language that could be used to create software to be embedded in
various consumer electronic devices.
➢ Java is a programmer’s language.
➢ Java is cohesive and consistent.
➢ Except for those constraints imposed by the Internet environment, Java gives the
programmer, full control.
Finally, Java is to Internet programming where C was to system programming.
Importance of Java to the Internet:
Java has had a profound effect on the Internet. This is because; Java expands the
Universe of objects that can move about freely in Cyberspace. In a network, two
categories of objects are transmitted between the Server and the Personal computer.
They are: Passive information and Dynamic active programs. The Dynamic, Self-
executing programs cause serious problems in the areas of Security and probability.
But, Java addresses those concerns and by doing so, has opened the door to an exciting
new form of program called the Applet.
Java can be used to create two types of programs:
Applications and Applets: An application is a program that runs on our Computer under
the operating system of that computer. It is more or less like one creating using C or
C++. Java’s ability to create Applets makes it important. An Applet is an application
designed to be transmitted over the Internet and executed by a Java –compatible web
browser. An applet is actually a tiny Java program, dynamically downloaded across the
network, just like an image. But the difference is, it is an intelligent program, not just a
media file. It can react to the user input and dynamically change.
Features of Java:
Security:
35
Every time you that you download a “normal” program, you are risking a viral
infection. Prior to Java, most users did not download executable programs frequently,
and those who did scanned them for viruses prior to execution. Most users still worried
about the possibility of infecting their systems with a virus. In addition, another type of
malicious program exists that must be guarded against. This type of program can gather
private information, such as credit card numbers, bank account balances, and
passwords. Java answers both these concerns by providing a “firewall” between a
network application and your computer.
When you use a Java-compatible Web browser, you can safely download Java applets
without fear of virus infection or malicious intent.
Portability:
For programs to be dynamically downloaded to all the various types of platforms

connected to the Internet, some means of generating portable executable code is needed.
As you will see, the same mechanism that helps ensure security also helps create
portability. Indeed, Java’s solution to these two problems is both elegant and efficient.
The Byte code:
The key that allows the Java to solve the security and portability problems is that the
output of Java compiler is Byte code. Byte code is a highly optimized set of instructions
designed to be executed by the Java run-time system, which is called the Java Virtual
Machine (JVM). That is, in its standard form, the JVM is an interpreter for byte code.
Translating a Java program into byte code helps makes it much easier to run a program
in a wide variety of environments. The reason is, once the run-time package exists for
a given system, any Java program can run on it.
Although Java was designed for interpretation, there is technically nothing about Java
that prevents on-the-fly compilation of byte code into native code. Sun has just
completed its Just In Time (JIT) compiler for byte code. When the JIT compiler is a
part of JVM, it compiles byte code into executable code in real time, on a piece-by-
piece, demand basis. It is not possible to compile an entire Java program into executable
code all at once, because Java performs various run-time checks that can be done only
at run time. The JIT compiles code, as it is needed, during execution.
36
Java Virtual Machine (JVM):
Beyond the language, there is the Java virtual machine. The Java virtual machine is an
important element of the Java technology. The virtual machine can be embedded within
a web browser or an operating system. Once a piece of Java code is loaded onto a
machine, it is verified. As part of the loading process, a class loader is invoked and does
byte code verification makes sure that the code that’s has been generated by the
compiler will not corrupt the machine that it’s loaded on. Byte code verification takes
place at the end of the compilation process to make sure that is all accurate and correct.
So byte code verification is integral to the compiling and executing of Java code.
Overall Description
Java byte
Java Jav
Java .Class
Picture showing the development process of JAVA Program
Java programming uses to produce byte codes and executes them. The first box
indicates that the Java source code is located in a. Java file that is processed with a Java
compiler called javac. The Java compiler produces a file called a. class file, which
contains the byte code. The. Class file is then loaded across the network or loaded
locally on your machine into the execution environment is the Java virtual machine,
which interprets and executes the byte code.
Java Architecture:
Java architecture provides a portable, robust, high performing environment for

development. Java provides portability by compiling the byte codes for the Java Virtual
Machine, which is then interpreted on each platform by the run-time environment. Java
is a dynamic system, able to load code when needed from a machine in the same room
or across the planet.
Compilation of code:
When you compile the code, the Java compiler creates machine code (called byte code)
for a hypothetical machine called Java Virtual Machine (JVM). The JVM is supposed
to execute the byte code. The JVM is created for overcoming the issue of portability.
37
The code is written and compiled for one machine and interpreted on all machines. This
machine is called Java Virtual Machine.
Compiling and interpreting Java Source Code:
Java
PC Java Interpreter
Source Compile
(PC)
Code Byte code
Macintosh Java
………..
Compiler Interpreter
………..
(Macintosh)
(Platform
SPARC
indepen Java
Com Interpreter
(Sparc)
During run-time the Java interpreter tricks the byte code file into thinking that it is
running on a Java Virtual Machine. In reality this could be a Intel Pentium Windows
95 or Sun SARC station running Solaris or Apple Macintosh running system and all
could receive code from any computer through Internet and run the Applets.
Simple:
Java was designed to be easy for the Professional programmer to learn and to use
effectively. If you are an experienced C++ programmer, learning Java will be even
easier. Because Java inherits the C/C++ syntax and many of the object oriented features
of C++. Most of the confusing concepts from C++ are either left out of Java or
implemented in a cleaner, more approachable manner. In Java there are a small number
of clearly defined ways to accomplish a given task.
Object-Oriented:
Java was not designed to be source-code compatible with any other language. This
allowed the Java team the freedom to design with a blank slate. One outcome of this
was a clean usable, pragmatic approach to objects. The object model in Java is simple
38
and easy to extend, while simple types, such as integers, are kept as high-performance
non-objects.
Robust:
The multi-platform environment of the Web places extraordinary demands on a

program, because the program must execute reliably in a variety of systems. The ability
to create robust programs was given a high priority in the design of Java. Java is strictly
typed language; it checks your code at compile time and run time.
Java virtually eliminates the problems of memory management and deallocation, which
is completely automatic. In a well-written Java program, all run time errors can –and
should –be managed by your program.
JAVASCRIPT:
JavaScript is a script-based programming language that was developed by Netscape

Communication Corporation. JavaScript was originally called Live Script and renamed
as JavaScript to indicate its relationship with Java. JavaScript supports the development
of both client and server components of Web-based applications. On the client side, it
can be used to write programs that are executed by a Web browser within the context
of a Web page. On the server side, it can be used to write Web server programs that can
process information submitted by a Web browser and then updates the browser’s
display accordingly. Even though JavaScript supports both client and server Web
programming, we prefer JavaScript at Client side programming since most of the
browsers supports it. JavaScript is almost as easy to learn as HTML, and JavaScript
statements can be included in HTML documents by enclosing the statements between
a pair of scripting tags .
<SCRIPTS>..</SCRIPT>.
<SCRIPT LANGUAGE = “JavaScript”>
JavaScript statements
</SCRIPT>
Here are a few things we can do with JavaScript :
39
➢ ``Validate the contents of a form and make calculations.
➢ Add scrolling or changing messages to the Browser’s status line.
➢ Animate images or rotate images that change when we move the mouse over
them.
➢ Detect the browser in use and display different content for different browsers.
➢ Detect installed plug-ins and notify the user if a plug-in is required.
We can do much more with JavaScript, including creating entire application.
Java Script Vs Java:
JavaScript and Java are entirely different languages. A few of the most glaring
differences are:
➢ Java applets are generally displayed in a box within the web document; JavaScript
can affect any part of the Web document itself.
➢ While JavaScript is best suited to simple applications and adding interactive
features to Web pages; Java can be used for incredibly complex applications.
There are many other differences but the important thing to remember is that
JavaScript and Java are separate languages. They are both useful for different things;
in fact they can be used together to combine their advantages.
ADVANTAGES:
➢ JavaScript can be used for Sever-side and Client-side scripting.
➢ It is more flexible than VBScript.
➢ JavaScript is the default scripting languages at Client-side since all the browsers
supports it.
Hyper Text Markup Language:
Hypertext Markup Language (HTML), the languages of the World Wide Web (WWW),
allows users to produces Web pages that include text, graphics and pointer to other Web
pages (Hyperlinks).
HTML is not a programming language but it is an application of ISO Standard 8879,

SGML (Standard Generalized Markup Language), but specialized to hypertext and
40
adapted to the Web. The idea behind Hypertext is that instead of reading text in rigid
linear structure, we can easily jump from one point to another point. We can navigate
through the information based on our interest and preference. A markup language is
simply a series of elements, each delimited with special characters that define how text
or other items enclosed within the elements should be displayed. Hyperlinks are
underlined or emphasized works that load to other documents or some portions of the
same document.
HTML can be used to display any type of document on the host computer, which can
be geographically at a different location. It is a versatile language and can be used on
any platform or desktop.
HTML provides tags (special codes) to make the document look attractive. HTML tags
are not case-sensitive. Using graphics, fonts, different sizes, color, etc., can enhance the
presentation of the document. Anything that is not a tag is part of the document itself.
ADVANTAGES:
➢ A HTML document is small and hence easy to send over the net. It is small because
it does not include formatted information.
➢ HTML is platform independent.
➢ HTML tags are not case-sensitive.
Java Database Connectivity:
What Is JDBC?
JDBC is a Java API for executing SQL statements. (As a point of interest, JDBC is a
trademarked name and is not an acronym; nevertheless, JDBC is often thought of as
standing for Java Database Connectivity. It consists of a set of classes and interfaces
written in the Java programming language. JDBC provides a standard API for
tool/database developers and makes it possible to write database applications using a
pure Java API.
Using JDBC, it is easy to send SQL statements to virtually any relational database. One
can write a single program using the JDBC API, and the program will be able to send
SQL statements to the appropriate database. The combinations of Java and JDBC lets
a programmer write it once and run it anywhere.
41
4.3 CODE:
4.3.1 SAMPLE CODE
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Enabling Secure and Space Efficient Metadata Management in Encrypted

Deduplication</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link href="css/style.css" rel="stylesheet" type="text/css" />
<link rel="stylesheet" type="text/css" href="css/coin-slider.css" />
<script type="text/javascript" src="js/cufon-yui.js"></script>
<script type="text/javascript" src="js/droid_sans_400-

droid_sans_700.font.js"></script>
<script type="text/javascript" src="js/jquery-1.4.2.min.js"></script>
<script type="text/javascript" src="js/script.js"></script>
<script type="text/javascript" src="js/coin-slider.min.js"></script>
<style type="text/css">

</style>
</head>
<body>
<div class="main">
<div class="header">
<div class="header_resize">
<div class="menu_nav">
<ul>
<li class="active"><a href="index.html"><span>Home Page</span></a></li>
<li><a href="DataUser.jsp"><span>Data User </span></a></li>
<li><a href="CloudServer.jsp"><span>Cloud Server </span></a></li>
43
<li><a href="EndUser.jsp"><span>End User </span></a></li>
</ul>
</div>
<div class="clr"></div>
<div class="logo">
<h1><span class="style1"><a href="index.html" class="style1"></a>Enabling

Secure and Space Efficient Metadata Management in Encrypted
Deduplication</span></h1>
</div>
<div class="slider">
<div id="coin-slider"> <a href="#"><img src="images/slide1.jpg" width="960"

height="360" alt="" /> </a> <a href="#"><img src="images/slide2.jpg" width="960"
height="360" alt="" /> </a> <a href="#"><img src="images/slide3.jpg" width="960"
height="360" alt="" /> </a> </div>
</div>
</div>
</div>
<div class="content">
<div class="content_resize">
<div class="mainbar">
44
<div class="article">
<h2><span>WELCOME TO END USER LOGIN </span></h2>
<p class="infopost"> <a href="#" class="com"></a></p>
<div class="img"><img src="images/img1.jpg" width="620" height="154"

alt="" class="fl" /></div>
<div class="post_content">
<form action="EAuthentication.jsp" method="post" id="leavereply">
<ol>
<li>
<label for="name"><strong>Name (required)</strong></label>
<input id="name" name="userid" class="text" />
</li>
<li> <strong>
<label for="email">Password (required)</label>
</strong>
<label for="email"></label>
<input type="password" id="pass" name="pass" class="text" />
</li><li></li>
<li></li>
<li><br />
<a href="ERegister.html">REGISTER</a>
45
<input type="submit" name="imageField" id="imageField" class="LOGIN"
/>
</li>
</ol>
</form>
<p class="spec"><a href="#" class="rm">..</a></p>
</div>
</div>
</div>
<div class="sidebar">
<div class="searchform">
<form id="formsearch" name="formsearch" method="post" action="#">
<span>
<input name="editbox_search" class="editbox_search" id="editbox_search"

maxlength="80" value="Search our ste:" type="text" />
</span>
<input name="button_search" src="images/search.gif" class="button_search"

type="image" />
</form>
</div>
<div class="gadget">
<h2 class="star"><span>Home </span> Menu</h2>
46
<ul class="sb_menu">
<li><a href="index.html">Home</a></li>
<li><a href="DataUser.jsp">Data User </a></li>
<li><a href="CloudServer.jsp">Cloud Server </a></li>
<li><a href="EndUser.jsp">End User </a></li>
</ul>
</div>
</div>
</div>
</div>
<div class="fbg"></div>
<div class="footer">
</html>
47
4.4 TESTING
The following are the Testing Methodologies:
• Unit Testing.
• Integration Testing.
• User Acceptance Testing.
• Output Testing.
• Validation Testing.
Unit Testing
Unit testing focuses verification effort on the smallest unit of Software design
that is the module. Unit testing exercises specific paths in a module’s control structure
to ensure complete coverage and maximum error detection. This test focuses on each
module individually, ensuring that it functions properly as a unit. Hence, the naming is
Unit Testing.
During this testing, each module is tested individually and the module interfaces
are verified for the consistency with design specification. All important processing path
are tested for the expected results. All error handling paths are also tested.
Integration Testing
Integration testing addresses the issues associated with the dual problems of
verification and program construction. After the software has been integrated a set of
high order tests are conducted. The main objective in this testing process is to take unit
tested modules and builds a program structure that has been dictated by design.
The following are the types of Integration Testing:
1.Top Down Integration
This method is an incremental approach to the construction of program

structure. Modules are integrated by moving downward through the control hierarchy,
beginning with the main program module. The module subordinates to the main
program module are incorporated into the structure in either a depth first or breadth first
manner.
48
In this method, the software is tested from main module and individual stubs
are replaced when the test proceeds downwards.
2. Bottom-up Integration
This method begins the construction and testing with the modules at the lowest
level in the program structure. Since the modules are integrated from the bottom up,
processing required for modules subordinate to a given level is always available and
the need for stubs is eliminated. The bottom-up integration strategy may be
implemented with the following steps:
• The low-level modules are combined into clusters into clusters that perform a
specific Software sub-function.
• A driver (i.e.) the control program for testing is written to coordinate test case input
and output.
• The cluster is tested.
• Drivers are removed and clusters are combined moving upward in the program
structure
• The bottom-up approaches test each module individually and then each module is
module is integrated with a main module and tested for functionality.
User Acceptance Testing
User Acceptance of a system is the key factor for the success of any system.
The system under consideration is tested for user acceptance by constantly keeping in
touch with the prospective system users at the time of developing and making changes
wherever required. The system developed provides a friendly user interface that can
easily be understood even by a person who is new to the system.
Output Testing
After performing the validation testing, the next step is output testing of the
proposed system, since no system could be useful if it does not produce the required
output in the specified format. Asking the users about the format required by them tests
49
the outputs generated or displayed by the system under consideration. Hence the output
format is considered in 2 ways – one is on screen and another in printed format.
Validation Checking
Validation checks are performed on the following fields.
Text Field:
The text field can contain only the number of characters lesser than or equal to
its size. The text fields are alphanumeric in some tables and alphabetic in other tables.
Incorrect entry always flashes and error message.
Numeric Field:
The numeric field can contain only numbers from 0 to 9. An entry of any
character flashes an error message. The individual modules are checked for accuracy
and what it has to perform. Each module is subjected to test run along with sample
data. The individually tested modules are integrated into a single system. Testing
involves executing the real data information is used in the program the existence of any
program defect is inferred from the output. The testing should be planned so that all
the requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data and
produces and output revealing the errors in the system.
Preparation of Test Data
Taking various kinds of test data does the above testing. Preparation of
test data plays a vital role in the system testing. After preparing the test data the system
under study is tested using that test data. While testing the system by using test data
errors are again uncovered and corrected by using above testing steps and corrections
are also noted for future use.
Using Live Test Data:
Live test data are those that are actually extracted from organization files. After
a system is partially constructed, programmers or analysts often ask users to key in a
set of data from their normal activities. Then, the systems person uses this data as a way
50
to partially test the system. In other instances, programmers or analysts extract a set of
live data from the files and have them entered themselves.
It is difficult to obtain live data in sufficient amounts to conduct extensive

testing. And, although it is realistic data that will show how the system will perform for
the typical processing requirement, assuming that the live data entered are in fact
typical, such data generally will not test all combinations or formats that can enter the
system. This bias toward typical values then does not provide a true systems test and in
fact ignores the cases most likely to cause system failure.
Using Artificial Test Data:
Artificial test data are created solely for test purposes, since they can be
generated to test all combinations of formats and values. In other words, the artificial
data, which can quickly be prepared by a data generating utility program in the
information systems department, make possible the testing of all login and control paths
through the program.
The most effective test programs use artificial test data generated by persons
other than those who wrote the programs. Often, an independent team of testers
formulates a testing plan, using the systems specifications.
The package “Virtual Private Network” has satisfied all the requirements
specified as per software requirement specification and was accepted.
USER TRAINING
Whenever a new system is developed, user training is required to educate them

about the working of the system so that it can be put to efficient use by those for whom
the system has been primarily designed. For this purpose the normal working of the
project was demonstrated to the prospective users. Its working is easily understandable
and since the expected users are people who have good knowledge of computers, the
use of this system is very easy.
51
4.5 TEST CASES
Table 4.5.1 Test Cases
S.No Test Case Test Case Action Expected Result

Name Description Performed Results
1 User Provide Enter details Successfully Success
Registration username, email without leaving registered.
(successful) id, gender, any field.
phone no, and
checking
2 User Report error Enter invalid Display error Success
registration message login message\alert
(Unsuccessful) details\leave box regarding
any field the error.
without blank
3 User login Provide Enter details Successfully Success
(successful) username, user without leaving login
id and checking and field
type
4 User login Report error Enter invalid Display error Success

(unsuccessful) message login details/ message\alert
leave any field box regarding
without blank. the error
5 Data owner Provide Enter details Successfully Success
login username, user without leaving login
(successful) id and checking and field
type
6 Data owner Report error Enter invalid Display error Success
(unsuccessful) message login details/ message\alert
leave any field box regarding
without blank. the error
52
CHAPTER 5
RESULT ANALYSIS
5.1 OUTPUT-SCREENSHOTS
Figure 5.1: Home Page
Figure 5.2: Here the data user need to login by entering name and password
53
Figure 5.3: Here the menu will be displayed to upload load, to check data integrity,
deduplication etc.
Figure 5.4: This is cloud server page all the data uploaded by the owner will be stored
in cloud server.
54
Figure 5.5: This is the end user page the end user can request file and can view the file
responses.
Figure 5.6: Here in this page the file is divided into blocks in the encrypted form.
55
Figure 5.7: Here this page shows the confirmation of uploaded data successfully.
Figure 5.8: This page shows the encrypted deduplication found if we uploaded the
same file.
56
Figure 5.9: Here in this page the end user can download the file by entering the file
name and secret key.
Figure 5.10: In this page the user can decrypt and download the file requested by the
end user.
57
Figure 5.11: Here the file contents are displayed which are to be downloaded and by
clicking the download button the file is downloaded.
58
CHAPTER 6
CONCLUSION
6.1 CONCLUSION
We present Metadedup, which exploits the power of indirection to realize de

duplication to metadata. It significantly mitigates the metadata storage overhead in
encrypted de duplication, while preserving confidentiality guarantees for both data and
metadata. We extend Metadedup for secure and fault-tolerant storage in a multi-server
architecture; in particular, we propose a distributed key management approach, which
preserves security guarantees even a number of servers are compromised. We
extensively evaluate Metadedup from performance and storage efficiency aspects. We
show that Metadedup significantly suppresses the storage space of metadata, while
incurring limited performance overhead compared to network speed.
6.2 FUTURE ENHANCEMENT
The main concern with the cloud is security. This security issue has been solved to
some extent in this project, the enhancement can be made such that the integration of
various cryptographic algorithms would provide a better way to keep encrypt our data.
The main issue is that when complexity increases the performance is disturbed. The
various enciphering techniques through its algorithms can also be updated and
transformed and used in various decentralized platforms like blockchain other than only
using centralized platforms such as cloud. In our project, the main drawback is end user
can attack the files. This may be change in future by using different techniques.
59
REFERENCES
[1] Boost c++ libraries. https://www.boost.org/.
[2] Leveldb. https://github.com/google/leveldb.
[3] Openssl. https://www.openssl.org.
[4] FSL traces and snapshots public archive. http://tracer.filesystems. org/, 2014.
[5] F. Armknecht, J.-M. Bohli, G. O. Karame, and F. Youssef. Transparent data
deduplication in the cloud. In Proc. of ACM CCS, 2015.
[6] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson, and D.
Song. Provable data possession at untrusted stores. In Proc. of ACM CCS, 2007.
[7] M. Bellare, S. Keelveedhi, and T. Ristenpart. DupLESS: Server-aided encryption
for deduplicated storage. In Proc. of USENIX Security, 2013.
[8] M. Bellare, S. Keelveedhi, and T. Ristenpart. Message-locked encryption and secure
deduplication. In Proc. of EUROCRYPT, 2013.
[9] D. Bhagwat, K. Eshghi, D. D. E. Long, and M. Lillibridge. Extreme binning:
Scalable, parallel deduplication for chunk-based file backup. In Proc. of IEEE
MASCOTS, 2009.
[10] J. Black. Compare-by-hash: A reasoned analysis. In Proc. of USENIX ATC, 2006.
[11] D. R. Bobbarjung, S. Jagannathan, and C. Dubnicki. Improving duplicate
elimination in storage systems. ACM Transactions on Storage, 2(4):424–448, 2006.
[12] A. Z. Broder. On the resemblance and containment of documents. In Proc. of
SEQUENCES, 1997.
[13] D. Dobre, P. Viotti, and M. Vukoli`c. Hybris: Robust hybrid cloud storage. In Proc.
of ACM SoCC, 2014.
[14] J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, and M. Theimer. Reclaiming
space from duplicate files in a serverless distributed file system. In Proc. of IEEE
ICDCS, 2002.
60

B 11-3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

B 11-3

Uploaded by

Copyright:

Available Formats

CHAPTER 1

Chunk-based de duplication is widely used in modern primary and backup, storage

To provide confidentiality guarantees, encrypted deduplication adds an encryption

In addition to storing non duplicate data, a de duplicated storage system needs to

De duplication metadata is notoriously known to incur high storage overhead

Contributions. To address the storage overhead of both de duplication metadata

Metadedup builds on the idea of indirection. Instead of directly storing all de

We propose a distributed key management approach that adapts Metadedup into a

We implement a Metadedup prototype and evaluate it performance in a networked

Finally, we conduct trace-driven simulation on two real world datasets. We show

Encrypted De duplication Storage De duplication is a technique for space-efficient

Encrypted de duplication augments plain de duplication (i.e., de duplication

Historical MLE builds on some publicly available function (e.g., cryptographic

In this project, we focus on mitigating the metadata storage overhead in MLE-

Mathematical analysis. We first model the metadata storage overhead in

1.3 PROBLEM STATEMENT

1.4 EXISTING SYSTEM

Metadedup builds on the idea of indirection. Instead of directly storing al

The system proposes a distributed key management approach that adapts

The system implements a Metadedup prototype and evaluate its performance in a

2.1 LITERATURE SURVEY

1. Y. Zhou, D. Feng, W. Xia, M. Fu, F. Huang, Y. Zhang, and C. Li. SecDep: A

2. W. Xia, H. Jiang, D. Feng, F. Douglis, P. Shilane, Y. Hua, M. Fu, Y. Zhang, and

Data deduplication, an efficient approach to data reduction, has gained increasing

3. J. Li, P. P. C. Lee, Y. Ren, and X. Zhang. Metadedup: Deduplicating metadata in

Encrypted deduplication combines encryption and deduplication in a seamless way to

5. Y. Duan. Distributed key generation for encrypted deduplication: Achieving the

2.2.1 SOFTWARE REQUIREMENTS

➢ Operating System - Windows XP

➢ Coding Language - Java/J2EE(JSP,Servlet)

➢ Front End - J2EE

➢ Back End - MySQL

2.2.2 HARDWARE REQUIEMENTS

➢ Processor - Pentium –IV

➢ Key Board - Standard Windows Keyboard

➢ Mouse - Two or Three Button Mouse

3.1 SYSTEM ARCHITECTURE

Figure 3.1: Metadedup architecture

Algorithm 1: Write operation

Client input: target file f ile, client’s master key key

Initialize file and key recipes: fRecipe, kRecipe

Divide f ile into data chunks {dChunk}

for each dChunk do

dkey ← hash of dChunk

[dChunk]dkey ← encryption of dChunk with dkey

Divide {[dChunk]dkey} into segments {Seg}

for each Seg do

Initialize metadata chunk mChunk

for each [dChunk]dkey in Seg do

Add metadata of [dChunk]dkey into mChunk

mkey ← cryptographic hash of mChunk

[mChunk]mkey ← encryption of mChunk with mkey

Add deduplication metadata of [mChunk]mkey into fRecipe

Add mkey into kRecipe

[kRecipe]key ← encryption of kRecipe with key

Upload: fRecipe, [kRecipe]key, {[dChunk]dkey}, {[mChunk]mkey} Server input:

Deduplicate {[dChunk]dkey} and {[mChunk]mkey}

Store unique {[dChunk]dkey} and {[mChunk]mkey}

Store fRecipe and [kRecipe]

Client input: full pathname name of the target file

Request metadata based on name Server input: fRecipe, {[mChunk]mkey} and

Retrieve fRecipe and [kRecipe]key based on name