Major

CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
Chunk-based de duplication is widely used in modern primary and backup, storage

systems to achieve high storage savings. It stores only a single physical copy of duplicate
chunks, while referencing all duplicate chunks to the physical copy by small-size references.
Prior studies show that de duplication can effectively reduce the storage space of primary
storage by 50% and that of backup storage by up to 98%. This motivates the wide
deployment of deduplication in various commercial cloud storage services (e.g., Dropbox,
Google Drive, Bitcasa, Mozy, and Memopal) to reduce substantial storage costs.
To provide confidentiality guarantees, encrypted deduplication adds an encryption layer

to de duplication, such that each chunk, before being written to de duplicated storage, is
deterministically encrypted via symmetric-key encryption by a key derived from the chunk
content (e.g., the key is set to be the cryptographic hash of chunk content). This ensures that
duplicate chunks have identical content even after encryption, and hence we can still apply de
duplication to the encrypted chunks for storage savings. Many studies have designed various
encrypted de duplication schemes to efficiently manage outsourced data in cloud storage.
In addition to storing non duplicate data, a de duplicated storage system needs to keep de
duplication metadata. There are two types of de duplication metadata. To check if identical
chunks exist, the system maintains a fingerprint index that tracks the fingerprints of all
chunks that have already been stored. Also, to allow a file to be reconstructed, the system
maintains a file recipe that holds the mappings from the chunks in the file to the references of
the corresponding physical copies.
De duplication metadata is notoriously known to incur high storage overhead especially

for the highly redundant workloads (e.g., backups) as the metadata storage overhead becomes
more dominant. In this work, we argue that encrypted de duplication incurs even higher
metadata storage overhead, as it additionally keeps key metadata, such as the key recipes that
track the chunk-to-key mappings to allow the decryption of individual files. Since the key
recipes contain sensitive key information, they need to be managed separately from file
recipes, encrypted by the master keys of file owners, and individually stored for different file
owners. Such high metadata storage overhead can negate the storage effectiveness of
encrypted de duplication in real deployment.
Contributions. To address the storage overhead of both de duplication metadata and key
metadata, we design and implement Metadedup, a new encrypted de duplication system that
effectively suppresses metadata storage. Our contributions are summarized as follows.
Metadedup builds on the idea of indirection. Instead of directly storing all de duplication
and key metadata in both file and key recipes (both of which dominate the metadata storage
overhead), we group the metadata in the form of eta data chunks that are stored in encrypted
deduplication storage. Thus, both file and key recipes now store references to metadata
chunks, which now contain references to data chunks (i.e., the chunks of file data). If
Metadedup stores nearly identical files regularly (e.g., periodic backups), the corresponding
file and key metadata are expected to have long sequences of references that are in the same
order. This implies that the metadata chunks are highly redundant and hence can be
effectively de duplicated.
We propose a distributed key management approach that adapts Metadedup into a multi-
server architecture for fault tolerant data storage. We generate the key of each data chunk
from one of the servers, encode it via secret sharing and distribute the resulting shares to
remaining servers. This ensures fault-tolerant storage of data chunks, while being robust
against adversarial compromise on a number of servers through our decoupled management
of keys and shares.
We implement a Metadedup prototype and evaluate it performance in a networked setup.

Compared to the network speed of our Gigabit LAN test bed, we show that Metadedup incurs
only 13.09% and 3.06% of throughput loss in writing and restoring files, respectively.
Finally, we conduct trace-driven simulation on two real world datasets. We show that
Metadedup achieves up to 93.94% of metadata storage savings in encrypted deduplication.
We also show that Metadedup maintains the storage load balance among all servers.
1.2MOTIVATION
Encrypted De duplication Storage De duplication is a technique for space-efficient data

storage (see [40] for a complete survey of de duplication). It partitions file data into either
fixed-size or variable-size chunks, and identifies each chunk by the cryptographic hash,
called fingerprint, of the corresponding content. Suppose that the probability of fingerprint
collision against different chunks is practically negligible [10]. De duplication stores only one
physical copy of duplicate chunks, and refers the duplicate chunks that have the same
fingerprint to the physical copy by small references.
Encrypted de duplication augments plain de duplication (i.e., de duplication without

encryption) with an encryption layer that operates on the chunks before de duplication, and
provides data confidentiality guarantees in de duplicated storage. It implements the
encryption layer based on message locked encryption (MLE), which encrypts each chunk
with a symmetric key (called the MLE key) derived from the chunk content; for example, the
MLE key can be computed as the cryptographic hash of the chunk content in convergent
encryption. This ensures that the encrypted chunks derived from duplicate chunks still have
identical content, thereby being compatible with de duplication.
Historical MLE builds on some publicly available function (e.g., cryptographic hash
function) to generate MLE keys. It provides security protection for unpredictable chunks,
meaning that the chunks are drawn from a sufficiently large message set, such that the
content of a chunk cannot be easily predicted; otherwise, if a chunk is predictable and known
to be drawn from a finite set, historical MLE is vulnerable to the offline brute-force attack.
Specifically, given a target encrypted chunk, an adversary samples each possible chunk from
the finite message set, derives the corresponding MLE key (e.g., by applying the
cryptographic hash function to each sampled chunk), and encrypts each sampled chunk with
such a key. If the encryption result is equal to the target encrypted chunk, the adversary can
infer that the sampled chunk is the original input of the target encrypted chunk.
To defend against the offline brute-force attack, Dup LESS implements server-aided
MLE, which introduces a global secret and protects the key generation process against public
access. Server-aided MLE leverages a dedicated key manager to maintain a global secret that
is protected from an adversary. It derives the MLE key of each chunk based on both the
global secret and the cryptographic hash (a.k.a., fingerprint) of the chunk, such that an
adversary cannot feasibly derive the MLE keys of any sampled chunks without knowing the
global secret. Thus, if the global secret is secure, server aided MLE is robust against the
offline brute-force attack, and achieves security for both predictable and unpredictable
chunks; otherwise, if the global secret is compromised, it preserves the same security
guarantees for unpredictable chunks as in historical MLE. Dup LESS further implements two
mechanisms to strengthen security: (i) the oblivious pseudorandom function (OPRF)
protocol, which allows the client to submit the fingerprint of each chunk (to the key manager)
in a blinded manner, such that the key manager can successfully generate MLE keys without
learning the actual fingerprints; and (ii) the rate-limiting mechanism, which proactively
controls the key generation rate of the key manager, so as to defend the online brute-force
attack from the compromised clients that try to issue too many key generation requests.
In this paper, we focus on mitigating the metadata storage overhead in MLE-based

(including both historical MLE and server-aided MLE) encrypted deduplication storage
systems.
Mathematical analysis. We first model the metadata storage overhead in encrypted de

duplication. We refer to the data before and after de duplication as logical data and physical
data, respectively. Suppose that L is the size of logical data, P is the size of physical data, and
f is the ratio of the de duplication metadata size to the chunk size. Plain de duplication incurs
f_(L+P) of metadata storage [38], [39], where f _L is the size of file recipes and f_P is the
size of the fingerprint index. Encrypted de duplication has additional metadata storage for
keys. It incurs a total of f _ (L + P) + k _ L of metadata storage, where k is the ratio of the key
metadata size to the chunk size, and k _ L is the size of key recipes.
Based on the above analysis, we show via an example how the metadata storage
overhead becomes problematic in encrypted de duplication. Suppose that the size of de
duplication metadata is 30 bytes [39], the size of key metadata is 32 bytes (e.g., for AES-256
encryption keys), and the chunk size is 8KB [39]. Then f = 30 bytes 8KB _ 0.0037 and k = 32
bytes 8KB _ 0.0039. If the de duplication factor (i.e., L=P) is 50_ [39] and L = 50 TB, then
plain de duplication and encrypted de duplication incur 191.25GB and 391.25GB of
metadata, or equivalently 18.67% and 38.21% additional storage for 1 TB of physical data,
respectively.
1.3PROBLEM STATEMENT
In existing system, it stores only single copy of duplicate chunks, while referencing all
duplicate chunks to the physical copy which reduces primary storage by 50% and backup
storage by 98%. So, we have proposed a encrypted deduplication which adds an encryption
layer to deduplication such that each chunk before being written to deduplicated storage is
encrypted via symmetric key encryption by a key derived from the chunk content.
1.4EXISTING SYSTEM
Chunk based deduplication is widely used in modern primary and backup storage system to
achieve high storage savings. It stores only a single physical copy of duplicate chunks, while
referencing all duplicate chunks to the physical copy by small size references. Prior studies
show that deduplication can effectively reduce the storage space of primary storage by 50%
and that of backup storage by up to 98%. To provide confidentiality guarantees, encrypted
deduplication adds an encryption layer to deduplication, such that each chunk, before being
written to deduplicated storage, is deterministically encrypted via symmetric key encryption
by a derived from the chunk content (Ex: the key is set to be cryptographic hash of chunk
content). This ensures that duplicate chunks have identical content even after encryption, and
hence we can still apply deduplication to the encrypted chunks for storage savings. Many
studies have designed various encrypted deduplication schemes to efficiently manage out
sourced data in cloud storage. In addition to storing non duplicate data, a deduplicated storage
system needs to keep deduplication metadata.
DISADVANTAGES:
 Metadedup is not designed for an organization that outsources the storage of users’
data to a remote shared storage system.
 There is less efficiency on the cloud data due to High storage overhead of metadata.
1.5PROPOSED SYSTEM
Metadedup builds on the idea of indirection. Instead of directly storing al deduplication

and key metadata in both file and key recipes (both of which dominate the metadata storage
overhead), we group the metadata in the form of metadata chunks that are stored in encrypted
deduplication storage. Thus, both file and key recipes now store references to metadata
chunks, which now contain references to data chunks (i.e., the chunks of file data). If
Metadedup stores nearly identical files regularly (e.g., periodic backups), the corresponding
file and key metadata are expected to have long sequences of references that are in the same
order. This implies that the metadata chunks are highly redundant and hence can be
effectively deduplicated.
The system proposes a distributed key management approach that adapts Metadedup into
a multi-server architecture for fault tolerant data storage. We generate the key of each data
chunk from one of the servers, encode it via secret sharing, and distribute the resulting shares
to remaining servers. This ensures fault-tolerant storage of data chunks, while being robust
against adversarial compromise on a number of servers through our decoupled management
of keys and shares.
The system implements a Metadedup prototype and evaluate its performance in a

networked setup. Compared to the network speed of our Gigabit LAN testbed, we show that
Metadedup incurs only 13.09% and 3.06% of throughput loss in writing and restoring files,
respectively.
Finally, the system conducts trace-driven simulation on two real world datasets. We
show that Metadedup achieves up to 93.94% of metadata storage savings in encrypted
deduplication. We also show that Metadedup maintains the storage load balance among all
servers.
ADVANTAGES:
 The system ensures that our Metadedup design is compatible with existing
countermeasures that address different threats against encrypted deduplication
systems.
 A malicious client may abuse client-side deduplication to learn whether other users
have already stored certain files, and Metadedup can defeat the side channel leakage
by adopting the server-side deduplication on cross-user data. A malicious server may
modify or even delete stored files to destroy the availability of outsourced files, and
Metadedup is compatible with the availability countermeasure that disperses data
across servers via deduplication-aware secret sharing.
CHAPTER 2
LITERATURE SURVEY AND PROBLEM IDENTIFICATION
2.1 LITERATURE SURVEY
1.Message-Locked Encryption and Secure Deduplication
AUTHOR: M.Bellare
They formalize a new cryptographic primitive, Message-Locked Encryption (MLE), where

the key under which encryption and decryption are performed is itself derived from the
message. MLE provides a way to achieve secure deduplication (space-efficient secure
outsourced storage), a goal currently targeted by numerous cloud-storage providers. We
provide definitions both for privacy and for a form of integrity that we call tag consistency.
Based on this foundation, we make both practical and theoretical contributions. On the
practical side, we provide ROM security analyses of a natural family of MLE schemes that
includes deployed schemes. On the theoretical side the challenge is standard model solutions,
and we make connections with deterministic encryption, hash functions secure on correlated
inputs and the sample-then-extract paradigm to deliver schemes under different assumptions
and for different classes of message sources. Our work shows that MLE is a primitive of both
practical and theoretical interest.
2.Transparent data deduplication in the cloud
AUTHOR: F.Armknecht
Cloud storage providers such as Dropbox and Google drive heavily rely on data deduplication to save
storage costs by only storing one copy of each uploaded file. Although recent studies report that
whole file deduplication can achieve up to 50% storage reduction, users do not directly benefit from
these savings—as there is no transparent relation between effective storage costs and the prices
offered to the users. In this paper, we propose a novel storage solution, ClearBox, which allows a
storage service provider to transparently attest to its customers the deduplication patterns of the
(encrypted) data that it is storing. By doing so, ClearBox enables cloud users to verify the effective
storage space that their data is occupying in the cloud, and consequently to check whether they qualify
for benefits such as price reductions, etc. ClearBox is secure against malicious users and a rational
storage provider, and ensures that files can only be accessed by their legitimate owners. We evaluate a
prototype implementation of ClearBox using both Amazon S3 and Dropbox as back-end cloud
storage. Our findings show that our solution works with the APIs provided by existing service
providers without any modifications and achieves comparable performance to existing solutions.
3.Server-aided encryption for deduplicated storage
AUTHOR: S. Keelveedhi
Cloud storage service providers such as Dropbox, Mozy, and others perform deduplication to
save space by only storing one copy of each file uploaded. Should clients conventionally
encrypt their files, however, savings are lost. Message-locked encryption (the most prominent
manifestation of which is convergent encryption) resolves this tension. However it is
inherently subject to brute-force attacks that can recover files falling into a known set. They
proposed an architecture that provides secure deduplicated storage resisting brute-force
attacks, and realize it in a system called DupLESS. In DupLESS, clients encrypt under
message-based keys obtained from a key-server via an oblivious PRF protocol. It enables
clients to store encrypted data with an existing service, have the service perform
deduplication on their behalf, and yet achieves strong confidentiality guarantees. They shown
that encryption for deduplicated storage can achieve performance and space savings close to
that of using the storage service with plaintext data.
4.Extreme binning: Scalable, parallel deduplication for chunk-based file

backup.
AUTHOR: D.Bhagwat
Data deduplication is an essential and critical component of backup systems. Essential,

because it reduces storage space requirements, and critical, because the performance of the
entire backup operation depends on its throughput. Traditional backup workloads consist of
large data streams with high locality, which existing deduplication techniques require to
provide reasonable throughput. We present Extreme Binning, a scalable deduplication
technique for non-traditional backup workloads that are made up of individual files with no
locality among consecutive files in a given window of time. Due to lack of locality, existing
techniques perform poorly on these workloads. Extreme Binning exploits file similarity
instead of locality, and makes only one disk access for chunk lookup per file, which gives
reasonable throughput. Multi-node backup systems built with Extreme Binning scale
gracefully with the amount of input data; more backup nodes can be added to boost
throughput. Each file is allocated using a stateless routing algorithm to only one node,
allowing for maximum parallelization, and each backup node is autonomous with no
dependency across nodes, making data management tasks robust with low overhead.
5.Distributed key generation for encryption deduplication
AUTHOR: Y.Duan
Large-scale storage systems often attempt to achieve two seemingly conflicting goals: the
systems need to reduce the copies of redundant data to save space, a process called
deduplication; and users demand encryption of their data to ensure privacy. Conventional
encryption makes deduplication on ciphertexts ineffective, as it destroys data redundancy. A
line of work, originated from Convergent Encryption, and evolved into Message Locked
Encryption, strives to solve this problem. The latest work, DupLESS, proposes a server-aided
architecture that provides the strongest privacy. The DupLESS architecture relies on a key
server to help the clients generate encryption keys that result in convergent ciphertexts. In
this paper, we first provide a rigorous proof of security, in the random oracle model, for the
DupLESS architecture which is lacking in the original paper. Our proof shows that using
additional secret, other than the data itself, for generating encryption keys achieves the best
possible security under current deduplication paradigm. We then introduce a distributed
protocol that eliminates the need for a key server and allows less managed systems such as
P2P systems to enjoy the high security level. Implementation and evaluation show that the
scheme is both robust and practical.
2.2 SOFTWARE AND HARDWARE REQUIREMENTS
2.2.1 SOFTWARE REQUIREMENTS
 Operating System - Windows XP

 Coding Language - Java/J2EE(JSP,Servlet)
 Front End - J2EE
 Back End - MySQL
2.2.2 HARDWARE REQUIEMENTS
 Processor - Pentium –IV

 RAM - 4 GB (min)
 Hard Disk - 20 GB
 Key Board - Standard Windows Keyboard
 Mouse - Two or Three Button Mouse
 Monitor - SVGA
CHAPTER 3
METHODOLOGY AND DESIGN ANALYSIS
3.1 SYSTEM ARCHITECTURE
Fig: Metadedup architecture
Here the data is divided into chunks and the chunks are highly redundant in real world so
they need to be deduplicated. And for every chunk a secret key is assigned. And dedup
algorithm will deduplicate the chunks. And the file recipes and key recipes will be used for
the reconstruction of the back-up files.
MODULES:
There are 5 modules are there in our project. They are:
1. Data Owner
2. Cloud Server
3. End User
4. Data Encryption and Decryption
5. Attacker
1.Data Owner Module
In this module, the data owner uploads their data in the cloud server. For the security purpose
the data owner encrypts the data file’s blocks and then store in the cloud. The data owner can
check the duplication of the file’s blocks over Corresponding cloud server. The Data owner
can have capable of manipulating the encrypted data file’s blocks and the data owner can
check the cloud data as well as the duplication of the specific file’s blocks and also he can
create remote user with respect to registered cloud servers.The data owner also checks data
integrity proof on which the block is modified by the attacker.
2. Cloud Server Module
The cloud service provider manages a cloud to provide data storage service. Data owners
encrypt their data file’s blocks and store them in the cloud for sharing with Remote User. To
access the shared data file’s blocks, data consumers download encrypted data file’s blocks of
their interest from the cloud and then decrypt them.
3. End User
In this module, remote user logs in by using his username and password. After he will request
for secrete key of required file’s blocks from cloud servers, and get the secrete key. After
getting secrete key he is trying to download file’s blocks by entering file’s blocks name and
secrete key from cloud server.
4. Data Encryption and Decryption
All the legal users in the system can freely query any interested encrypted and decrypted data.
Upon receiving the data from the server, the user runs the decryption algorithm Decrypt to
decrypt the cipher text by using its secret keys from different Users. Only the attributes the
user possesses satisfy the access structure defined in the cipher text CT, the user can get the
content key.
5.Attacker Module
The user who attacks or modifies the block content called attacker. The attacker may the user
who tries to access the file contents by wrong secret key from the cloud server.
3.2 SYSTEM DESIGN
UML DIAGRAMS:
UML stands for Unified Modelling Language. UML is a standardized general purpose
modelling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object-oriented
computer software. In its current form UML is comprised of two major components: a Meta-
model and a notation. In the future, some form of method or process may also be added to; or
associated with, UML.
The Unified Modelling Language is a standard language for specifying, Visualization,

Constructing and documenting the artifacts of software system, as well as for business
modelling and other non-software systems.
The UML represents a collection of best engineering practices that have proven successful in
the modelling of large and complex systems.
The UML is a very important part of developing object-oriented software and the software
development process. The UML uses mostly graphical notations to express the design of
software projects.
GOALS: The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modelling Language so that they can
develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modelling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks,
patterns and components.
7. Integrate best practices.
3.2.1 Class Diagram
The UML Class diagram is a graphical notation used to construct and visualize object-oriented

systems. A class diagram in the Unified Modelling Language (UML) is a type of static structure
diagram that describes the structure of a system by showing the system's:
 classes,
 their attributes,
 operations (or methods),

 the relationships among objects.
The figure represents the class diagram of the project in this we can see cloud server, data
owner, end user, attacker.
The data owner, the owner initially register the account by filling details the list and after
completion of register he can easily login to the account and after login the owner he may
upload the data in the cloud server for the security. Data owner may upload the data in form
of blocks so he can easily encrypt the data.
The cloud server also register and login to the account by filling the details.
3.2.2 Use Case Diagram

A use case diagram in the Unified Modelling Language (UML) is a type of behavioural
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented
as use cases), and any dependencies between those use cases. The main purpose of a use case
diagram is to show what system functions are performed for which actor. Roles of the actors
in the system can be depicted.
Fig: Use Case diagram for Data Owner and Cloud Server
The use case diagram represents that data owner must and register and login and browse files
split into the blocks and then encryption and upload he may check for data deduplication and
data integrity. If the data is duplicate data, then it displays deduplication found.
Fig: Use case diagram for End user and Cloud Server
This is use case diagram for end user and cloud server. The end user and cloud sever has to
login. Then end user will request for the file which he want. Then the cloud server will view
the file requests and respond for the requests. That means cloud server will give permission to
access the files. Then the end user can able to download the files.
3.2.3 Sequence Diagram
A sequence diagram in Unified Modelling Language (UML) is a kind of interaction diagram

that shows how processes operate with one another and in what order. It is a construct of a
Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event
scenarios, and timing diagrams.
Fig
First the data owner will register and login. Data owner will upload the file, then the cloud
server will divide the uploaded file into blocks and for each block server will give a mac and
the file is encrypted. The end user will request for the file then the server will respond for the
file request. The end user will be able to download the file.
3.2.1Dataflow Diagram:
1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of input data to the system, various processing
carried out on this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data
used by the process, an external entity that interacts with the system and the
information flows in the system.
3. DFD shows how the information moves through the system and how it is modified by
a series of transformations. It is a graphical technique that depicts information flow
and the transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at any
level of abstraction. DFD may be partitioned into levels that represent increasing
information flow and functional detail.
Fig: Data Flow Diagram for Level 0
This is Level 0 DFD. First the data owner has to get register and login. Then he will upload
the file. If the data owner wants to update the file then he will able to update the file.
Fig: Data Flow Diagram for Level 1
Data owner has to get register. After registration he has to login. The data owner will upload
the file then the file sent to cloud server. Whenever the user want the file, then end user has to
send the request for the file then the cloud server will respond for the file request. Cloud
server will give permission to access the file. Then the end user will be able to download the
file.
Flowchart Diagram:
Most flowcharts use only a simplified set of symbols with a single type of relationship, a
solid line with an arrow. Diagrams should begin and ideally end with the terminator symbol.
However, this rule is not as strict as it was in the UML Activity diagram. The terminator is
usually drawn as a rectangle with distinctly round corners. In some diagrams, however, you
may also encounter an ellipse or circle. The terminator where the process flow starts usually
contains the "Start" text and the one where the process ends the "End" text. Some diagrams
have multiple end points and some none.
Fig: Flow chart for Data owner
In the above flow chart, it shows the sequence of steps performed in the project. Data owner
and user has to login to the cloud to perform the operations.
CHAPTER 4
IMPLEMENTATION
4.1 SOFTWARE ENVIRONMENT
Client server:
Overview:
With the varied topic in existence in the fields of computers, Client Server is one, which has
generated more heat than light, and also more hype than reality. This technology has acquired
a certain critical mass attention with its dedication conferences and magazines. Major
computer vendors such as IBM and DEC, have declared that Client Servers is their main
future market. A survey of DBMS magazine reveled that 76% of its readers were actively
looking at the client server solution. The growth in the client server development tools from
$200 million in 1992 to more than $1.2 billion in 1996.
Client server implementations are complex but the underlying concept is simple and
powerful. A client is an application running with local resources but able to request the
database and relate the services from separate remote server. The software mediating this
client server interaction is often referred to as MIDDLEWARE.
The typical client either a PC or a Work Station connected through a network to a more
powerful PC, Workstation, Midrange or Main Frames server usually capable of handling
request from more than one client. However, with some configuration server may also act as
client. A server may need to access other server in order to process the original client request.
The key client server idea is that client as user is essentially insulated from the physical
location and formats of the data needs for their application. With the proper middleware, a
client input from or report can transparently access and manipulate both local database on the
client machine and remote databases on one or more servers. An added bonus is the client
server opens the door to multi-vendor database access indulging heterogeneous table joins.
What is Client Server:
Two prominent systems in existence are client server and file server systems. It is essential to
distinguish between client servers and file server systems. Both provide shared network
access to data but the comparison dens there! The file server simply provides a remote disk
drive that can be accessed by LAN applications on a file by file basis. The client server offers
full relational database services such as SQL-Access, Record modifying, Insert, Delete with
full relational integrity backup/ restore performance for high volume of transactions, etc. the
client server middleware provides a flexible interface between client and server, who does
what, when and to whom.
Why Client Server:
Client server has evolved to solve a problem that has been around since the earliest days of
computing: how best to distribute your computing, data generation and data storage resources
in order to obtain efficient, cost effective departmental an enterprise wide data processing.
During mainframe era choices were quite limited. A central machine housed both the CPU
and DATA (cards, tapes, drums and later disks). Access to these resources was initially
confined to batched runs that produced departmental reports at the appropriate intervals. A
strong central information service department ruled the corporation. The role of the rest of the
corporation limited to requesting new or more frequent reports and to provide hand written
forms from which the central data banks were created and updated. The earliest client server
solutions therefore could best be characterized as “SLAVE-MASTER”.
Front End or User Interface Design :
The entire user interface is planned to be developed in browser specific environment with a
touch of Intranet-Based Architecture for achieving the Distributed Concept.The browser
specific components are designed by using the HTML standards, and the dynamism of the
designed by concentrating on the constructs of the Java Server Pages.
Communication or Database Connectivity Tier :
The Communication architecture is designed by concentrating on the Standards of Servlets

and Enterprise Java Beans. The database connectivity is established by using the Java Data
Base Connectivity.The standards of three-tire architecture are given major concentration to
keep the standards of higher cohesion and limited coupling for effectiveness of the
operations.
Features of The Language Used:
In my project, I have chosen Java language for developing the code.
About Java:
Initially the language was called as “oak” but it was renamed as “Java” in 1995. The primary
motivation of this language was the need for a platform-independent (i.e., architecture
neutral) language that could be used to create software to be embedded in various consumer
electronic devices.
 Java is a programmer’s language.
 Java is cohesive and consistent.
 Except for those constraints imposed by the Internet environment, Java gives the
programmer, full control.
Finally, Java is to Internet programming where C was to system programming.
Importance of Java to the Internet:
Java has had a profound effect on the Internet. This is because; Java expands the Universe of
objects that can move about freely in Cyberspace. In a network, two categories of objects are
transmitted between the Server and the Personal computer. They are: Passive information and
Dynamic active programs. The Dynamic, Self-executing programs cause serious problems in
the areas of Security and probability. But, Java addresses those concerns and by doing so, has
opened the door to an exciting new form of program called the Applet.
Java can be used to create two types of programs:
Applications and Applets: An application is a program that runs on our Computer under the
operating system of that computer. It is more or less like one creating using C or C++. Java’s
ability to create Applets makes it important. An Applet is an application designed to be
transmitted over the Internet and executed by a Java –compatible web browser. An applet is
actually a tiny Java program, dynamically downloaded across the network, just like an image.
But the difference is, it is an intelligent program, not just a media file. It can react to the user
input and dynamically change.
Features of Java:
Security:
Every time you that you download a “normal” program, you are risking a viral infection.
Prior to Java, most users did not download executable programs frequently, and those who
did scanned them for viruses prior to execution. Most users still worried about the possibility
of infecting their systems with a virus. In addition, another type of malicious program exists
that must be guarded against. This type of program can gather private information, such as
credit card numbers, bank account balances, and passwords. Java answers both these
concerns by providing a “firewall” between a network application and your computer.
When you use a Java-compatible Web browser, you can safely download Java applets
without fear of virus infection or malicious intent.
Portability:
For programs to be dynamically downloaded to all the various types of platforms connected
to the Internet, some means of generating portable executable code is needed. As you will
see, the same mechanism that helps ensure security also helps create portability. Indeed,
Java’s solution to these two problems is both elegant and efficient.
The Byte code:
The key that allows the Java to solve the security and portability problems is that the output
of Java compiler is Byte code. Byte code is a highly optimized set of instructions designed to
be executed by the Java run-time system, which is called the Java Virtual Machine (JVM).
That is, in its standard form, the JVM is an interpreter for byte code. Translating a Java
program into byte code helps makes it much easier to run a program in a wide variety of
environments. The reason is, once the run-time package exists for a given system, any Java
program can run on it.
Although Java was designed for interpretation, there is technically nothing about Java that
prevents on-the-fly compilation of byte code into native code. Sun has just completed its Just
In Time (JIT) compiler for byte code. When the JIT compiler is a part of JVM, it compiles
byte code into executable code in real time, on a piece-by-piece, demand basis. It is not
possible to compile an entire Java program into executable code all at once, because Java
performs various run-time checks that can be done only at run time. The JIT compiles code,
as it is needed, during execution.
Java Virtual Machine (JVM):
Beyond the language, there is the Java virtual machine. The Java virtual machine is an
important element of the Java technology. The virtual machine can be embedded within a
web browser or an operating system. Once a piece of Java code is loaded onto a machine, it is
verified. As part of the loading process, a class loader is invoked and does byte code
verification makes sure that the code that’s has been generated by the compiler will not
corrupt the machine that it’s loaded on. Byte code verification takes place at the end of the
compilation process to make sure that is all accurate and correct. So byte code verification is
integral to the compiling and executing of Java code.
Overall Description
Java Source Java byte code JavaVM
Java .Class
Picture showing the development process of JAVA Program
Java programming uses to produce byte codes and executes them. The first box indicates that
the Java source code is located in a. Java file that is processed with a Java compiler called
javac. The Java compiler produces a file called a. class file, which contains the byte code.
The. Class file is then loaded across the network or loaded locally on your machine into the
execution environment is the Java virtual machine, which interprets and executes the byte
code.
Java Architecture:
Java architecture provides a portable, robust, high performing environment for development.
Java provides portability by compiling the byte codes for the Java Virtual Machine, which is
then interpreted on each platform by the run-time environment. Java is a dynamic system,
able to load code when needed from a machine in the same room or across the planet.
Compilation of code:
When you compile the code, the Java compiler creates machine code (called byte code) for a
hypothetical machine called Java Virtual Machine (JVM). The JVM is supposed to execute
the byte code. The JVM is created for overcoming the issue of portability. The code is written
and compiled for one machine and interpreted on all machines. This machine is called Java
Virtual Machine.
Compiling and interpreting Java Source Code:
Java
PC Compiler Interpreter
Java (PC)
Source
Code Byte code
………..
……….. Macintosh Java
Compiler (Platform Interpreter
indepen (Macintosh)
……….. dent)
………… SPARC
Java
Compiler Interpreter
(Sparc)
During run-time the Java interpreter tricks the byte code file into thinking that it is running on
a Java Virtual Machine. In reality this could be a Intel Pentium Windows 95 or Sun SARC
station running Solaris or Apple Macintosh running system and all could receive code from
any computer through Internet and run the Applets.
Simple:
Java was designed to be easy for the Professional programmer to learn and to use effectively.
If you are an experienced C++ programmer, learning Java will be even easier. Because Java
inherits the C/C++ syntax and many of the object oriented features of C++. Most of the
confusing concepts from C++ are either left out of Java or implemented in a cleaner, more
approachable manner. In Java there are a small number of clearly defined ways to accomplish
a given task.
Object-Oriented:
Java was not designed to be source-code compatible with any other language. This allowed
the Java team the freedom to design with a blank slate. One outcome of this was a clean
usable, pragmatic approach to objects. The object model in Java is simple and easy to extend,
while simple types, such as integers, are kept as high-performance non-objects.
Robust:
The multi-platform environment of the Web places extraordinary demands on a program,

because the program must execute reliably in a variety of systems. The ability to create robust
programs was given a high priority in the design of Java. Java is strictly typed language; it
checks your code at compile time and run time.
Java virtually eliminates the problems of memory management and deallocation, which is
completely automatic. In a well-written Java program, all run time errors can –and should –be
managed by your program.
JAVASCRIPT:
JavaScript is a script-based programming language that was developed by Netscape

Communication Corporation. JavaScript was originally called Live Script and renamed as
JavaScript to indicate its relationship with Java. JavaScript supports the development of both
client and server components of Web-based applications. On the client side, it can be used to
write programs that are executed by a Web browser within the context of a Web page. On the
server side, it can be used to write Web server programs that can process information
submitted by a Web browser and then updates the browser’s display accordingly. Even
though JavaScript supports both client and server Web programming, we prefer JavaScript at
Client side programming since most of the browsers supports it. JavaScript is almost as easy
to learn as HTML, and JavaScript statements can be included in HTML documents by
enclosing the statements between a pair of scripting tags .
<SCRIPTS>..</SCRIPT>.
<SCRIPT LANGUAGE = “JavaScript”>

JavaScript statements
</SCRIPT>
Here are a few things we can do with JavaScript :
 ``Validate the contents of a form and make calculations.

 Add scrolling or changing messages to the Browser’s status line.
 Animate images or rotate images that change when we move the mouse over them.
 Detect the browser in use and display different content for different browsers.
 Detect installed plug-ins and notify the user if a plug-in is required.
We can do much more with JavaScript, including creating entire application.
Java Script Vs Java:
JavaScript and Java are entirely different languages. A few of the most glaring differences
are:
 Java applets are generally displayed in a box within the web document; JavaScript can
affect any part of the Web document itself.
 While JavaScript is best suited to simple applications and adding interactive features to
Web pages; Java can be used for incredibly complex applications.
There are many other differences but the important thing to remember is that JavaScript and
Java are separate languages. They are both useful for different things; in fact they can be used
together to combine their advantages.
ADVANTAGES:
 JavaScript can be used for Sever-side and Client-side scripting.
 It is more flexible than VBScript.
 JavaScript is the default scripting languages at Client-side since all the browsers
supports it.
Hyper Text Markup Language:

Hypertext Markup Language (HTML), the languages of the World Wide Web (WWW),
allows users to produces Web pages that include text, graphics and pointer to other Web
pages (Hyperlinks).
HTML is not a programming language but it is an application of ISO Standard 8879, SGML
(Standard Generalized Markup Language), but specialized to hypertext and adapted to the
Web. The idea behind Hypertext is that instead of reading text in rigid linear structure, we
can easily jump from one point to another point. We can navigate through the information
based on our interest and preference. A markup language is simply a series of elements, each
delimited with special characters that define how text or other items enclosed within the
elements should be displayed. Hyperlinks are underlined or emphasized works that load to
other documents or some portions of the same document.
HTML can be used to display any type of document on the host computer, which can be
geographically at a different location. It is a versatile language and can be used on any
platform or desktop.
HTML provides tags (special codes) to make the document look attractive. HTML tags are
not case-sensitive. Using graphics, fonts, different sizes, color, etc., can enhance the
presentation of the document. Anything that is not a tag is part of the document itself.
ADVANTAGES:
 A HTML document is small and hence easy to send over the net. It is small because it
does not include formatted information.
 HTML is platform independent.
 HTML tags are not case-sensitive.
Java Database Connectivity:
What Is JDBC?
JDBC is a Java API for executing SQL statements. (As a point of interest, JDBC is a
trademarked name and is not an acronym; nevertheless, JDBC is often thought of as standing
for Java Database Connectivity. It consists of a set of classes and interfaces written in the
Java programming language. JDBC provides a standard API for tool/database developers and
makes it possible to write database applications using a pure Java API.
Using JDBC, it is easy to send SQL statements to virtually any relational database. One can
write a single program using the JDBC API, and the program will be able to send SQL
statements to the appropriate database. The combinations of Java and JDBC lets a
programmer write it once and run it anywhere.
What Does JDBC Do?
Simply put, JDBC makes it possible to do three things:
 Establish a connection with a database
 Send SQL statements
 Process the results.
JDBC versus ODBC and other APIs:
At this point, Microsoft's ODBC (Open Database Connectivity) API is that probably the most
widely used programming interface for accessing relational databases. It offers the ability to
connect to almost all databases on almost all platforms.
So why not just use ODBC from Java? The answer is that you can use ODBC from Java, but
this is best done with the help of JDBC in the form of the JDBC-ODBC Bridge, which we
will cover shortly. The question now becomes "Why do you need JDBC?" There are several
answers to this question:
1. ODBC is not appropriate for direct use from Java because it uses a C interface. Calls from
Java to native C code have a number of drawbacks in the security, implementation,
robustness, and automatic portability of applications.
2. A literal translation of the ODBC C API into a Java API would not be desirable. For
example, Java has no pointers, and ODBC makes copious use of them, including the
notoriously error-prone generic pointer "void *". You can think of JDBC as ODBC
translated into an object-oriented interface that is natural for Java programmers.
3. ODBC is hard to learn. It mixes simple and advanced features together, and it has
complex options even for simple queries. JDBC, on the other hand, was designed to keep
simple things simple while allowing more advanced capabilities where required.
4. A Java API like JDBC is needed in order to enable a "pure Java" solution. When ODBC
is used, the ODBC driver manager and drivers must be manually installed on every client
machine. When the JDBC driver is written completely in Java, however, JDBC code is
automatically installable, portable, and secure on all Java platforms from network
computers to mainframes.
Two-tier and Three-tier Models:
The JDBC API supports both two-tier and three-tier models for database access.
In the two-tier model, a Java applet or application talks directly to the database. This requires
a JDBC driver that can communicate with the particular database management system being
accessed. A user's SQL statements are delivered to the database, and the results of those
statements are sent back to the user.
JAVA
Application Client machine
JDBC
DBMS-proprietary protocol
DBMS Database server
The database may be located on another machine to which the user is connected via a
network. This is referred to as a client/server configuration, with the user's machine as the
client, and the machine housing the database as the server. The network can be an Intranet,
which, for example, connects employees within a corporation, or it can be the Internet.
In the three-tier model, commands are sent to a "middle tier" of services, which then send
SQL statements to the database. The database processes the SQL statements and sends the
results back to the middle tier, which then sends them to the user. MIS directors find the
Java applet or
Client machine (GUI)
Html browser
HTTP, RMI, or CORBA calls
Application
Server (Java) DBMS-proprietary
Server protocol
machine (business Logic)
JDBC
DBMS
three-tier model very attractive because the middle tier makes it possible to maintain control
over access
and the kinds of updates that can be made to corporate data. Another advantage is that when
there is a middle tier, the user can employ an easy-to-use higher-level API which is translated
by the middle tier into the appropriate low-level calls. Finally, in many cases the three-tier
architecture can provide performance advantages.Until now the middle tier has typically been
written in languages such as C or C++, which offer fast performance. However, with the
introduction of optimizing compilers that translate Java byte code into efficient machine-
specific code, it is becoming practical to implement the middle tier in Java. This is a big plus,
making it possible to take advantage of Java's robustness, multithreading, and security
features. JDBC is important to allow database access from a Java middle tier.
JDBC Driver Types:
The JDBC drivers that we are aware of at this time fit into one of four categories:
 JDBC-ODBC bridge plus ODBC driver

 Native-API partly-Java driver
 JDBC-Net pure Java driver
 Native-protocol pure Java driver
JDBC-ODBC Bridge:
If possible, use a Pure Java JDBC driver instead of the Bridge and an ODBC driver.
This completely eliminates the client configuration required by ODBC. It also eliminates the
potential that the Java VM could be corrupted by an error in the native code brought in by the
Bridge (that is, the Bridge native library, the ODBC driver manager library, the ODBC driver
library, and the database client library).
What Is the JDBC- ODBC Bridge? :
The JDBC-ODBC Bridge is a JDBC driver, which implements JDBC operations by

translating them into ODBC operations. To ODBC it appears as a normal application
program. The Bridge implements JDBC for any database for which an ODBC driver is
available. The Bridge is implemented as the sun.jdbc.odbc Java package and contains a native
library used to access ODBC. The Bridge is a joint development of Intersolv and JavaSoft.
Java Server Pages (JSP):
Java server Pages is a simple, yet powerful technology for creating and maintaining
dynamic-content web pages. Based on the Java programming language, Java Server Pages
offers proven portability, open standards, and a mature re-usable component model .The Java
Server Pages architecture enables the separation of content generation from content
presentation. This separation not eases maintenance headaches, it also allows web team
members to focus on their areas of expertise. Now, web page designer can concentrate on
layout, and web application designers on programming, with minimal concern about
impacting each other’s work.
Features of JSP:
Portability:
Java Server Pages files can be run on any web server or web-enabled application
server that provides support for them. Dubbed the JSP engine, this support involves
recognition, translation, and management of the Java Server Page lifecycle and its interaction
components.
Components:
It was mentioned earlier that the Java Server Pages architecture can include reusable
Java components. The architecture also allows for the embedding of a scripting language
directly into the Java Server Pages file. The components current supported include Java
Beans, and Servlets.
Processing:
A Java Server Pages file is essentially an HTML document with JSP scripting or tags.
The Java Server Pages file has a JSP extension to the server as a Java Server Pages file.
Before the page is served, the Java Server Pages syntax is parsed and processed into a Servlet
on the server side. The Servlet that is generated outputs real content in straight HTML for
responding to the client.
Access Models:
A Java Server Pages file may be accessed in at least two different ways. A client’s request
comes directly into a Java Server Page. In this scenario, suppose the page accesses reusable
Java Bean components that perform particular well-defined computations like accessing a
database. The result of the Beans computations, called result sets is stored within the Bean as
properties. The page uses such Beans to generate dynamic content and present it back to the
client.
In both of the above cases, the page could also contain any valid Java code. Java Server
Pages architecture encourages separation of content from presentation.
Steps in the execution of a JSP Application:
1. The client sends a request to the web server for a JSP file by giving the name of the JSP
file within the form tag of a HTML page.
2. This request is transferred to the JavaWebServer. At the server side JavaWebServer
receives the request and if it is a request for a jsp file server gives this request to the JSP
engine.
3. JSP engine is program which can understands the tags of the jsp and then it converts those
tags into a Servlet program and it is stored at the server side. This Servlet is loaded in the
memory and then it is executed and the result is given back to the JavaWebServer and
then it is transferred back to the result is given back to the JavaWebServer and then it is
transferred back to the client.
JDBC connectivity:
The JDBC provides database-independent connectivity between the J2EE platform and a
wide range of tabular data sources. JDBC technology allows an Application Component
Provider to:
 Perform connection and authentication to a database server

 Manager transactions
 Move SQL statements to a database engine for preprocessing and execution
 Execute stored procedures
 Inspect and modify the results from Select statements.
Tomcat 6.0 web server:

Tomcat is an open source web server developed by Apache Group. Apache Tomcat is the
servlet container that is used in the official Reference Implementation for the Java Servlet and
Java Server Pages technologies. The Java Servlet and Java Server Pages specifications are
developed by Sun under the Java Community Process. Web Servers like Apache Tomcat
support only web components while an application server supports web components as well
as business components (BEAs Weblogic, is one of the popular application server).To
develop a web application with jsp/servlet install any web server like JRun, Tomcat etc to run
your application.
Bibliography:
References for the Project Development were taken from the following
Books and Web Sites.
Oracle
PL/SQL Programming by Scott Urman
SQL complete reference by Livion
JAVA Technologies
JAVA Complete Reference
Java Script Programming by Yehuda Shiran

Mastering JAVA Security
JAVA2 Networking by Pistoria
JAVA Security by Scotl oaks
Head First EJB Sierra Bates
J2EE Professional by Shadab siddiqui
JAVA server pages by Larne Pekowsley
JAVA Server pages by Nick Todd
HTML
HTML Black Book by Holzner
JDBC
Java Database Programming with JDBC by Patel moss.
Software Engineering by Roger Pressman

SAMPLE CODE:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Enabling Secure and Space Efficient Metadata Management in Encrypted

Deduplication</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link href="css/style.css" rel="stylesheet" type="text/css" />
<link rel="stylesheet" type="text/css" href="css/coin-slider.css" />
<script type="text/javascript" src="js/cufon-yui.js"></script>
<script type="text/javascript" src="js/droid_sans_400-droid_sans_700.font.js"></script>
<script type="text/javascript" src="js/jquery-1.4.2.min.js"></script>
<script type="text/javascript" src="js/script.js"></script>
<script type="text/javascript" src="js/coin-slider.min.js"></script>
<style type="text/css">

</style>
</head>
<body>
<div class="main">
<div class="header">
<div class="header_resize">
<div class="menu_nav">
<ul>
<li class="active"><a href="index.html"><span>Home Page</span></a></li>
<li><a href="DataUser.jsp"><span>Data User </span></a></li>
<li><a href="CloudServer.jsp"><span>Cloud Server </span></a></li>
<li><a href="EndUser.jsp"><span>End User </span></a></li>

</ul>
</div>
<div class="clr"></div>
<div class="logo">
<h1><span class="style1"><a href="index.html" class="style1"></a>Enabling Secure

and Space Efficient Metadata Management in Encrypted Deduplication</span></h1>
</div>
<div class="slider">
<div id="coin-slider"> <a href="#"><img src="images/slide1.jpg" width="960"

height="360" alt="" /> </a> <a href="#"><img src="images/slide2.jpg" width="960"
height="360" alt="" /> </a> <a href="#"><img src="images/slide3.jpg" width="960"
height="360" alt="" /> </a> </div>
</div>
</div>
</div>
<div class="content">
<div class="content_resize">
<div class="mainbar">
<div class="article">
<h2><span>WELCOME TO END USER LOGIN </span></h2>
<p class="infopost"> <a href="#" class="com"></a></p>

<div class="img"><img src="images/img1.jpg" width="620" height="154" alt=""

class="fl" /></div>
<div class="post_content">
<form action="EAuthentication.jsp" method="post" id="leavereply">
<ol>
<li>
<label for="name"><strong>Name (required)</strong></label>
<input id="name" name="userid" class="text" />
</li>
<li> <strong>
<label for="email">Password (required)</label>
</strong>
<label for="email"></label>
<input type="password" id="pass" name="pass" class="text" />
</li><li></li>
<li></li>
<li><br />
<a href="ERegister.html">REGISTER</a>
<input type="submit" name="imageField" id="imageField" class="LOGIN" />
</li>
</ol>
</form>
<p class="spec"><a href="#" class="rm">..</a></p>
</div>
</div>
</div>
<div class="sidebar">
<div class="searchform">
<form id="formsearch" name="formsearch" method="post" action="#">
<span>
<input name="editbox_search" class="editbox_search" id="editbox_search"

maxlength="80" value="Search our ste:" type="text" />
</span>
<input name="button_search" src="images/search.gif" class="button_search"

type="image" />
</form>
</div>
<div class="gadget">
<h2 class="star"><span>Home </span> Menu</h2>
<ul class="sb_menu">
<li><a href="index.html">Home</a></li>
<li><a href="DataUser.jsp">Data User </a></li>

<li><a href="CloudServer.jsp">Cloud Server </a></li>
<li><a href="EndUser.jsp">End User </a></li>
</ul>
</div>
</div>
</div>
</div>
<div class="fbg"></div>
<div class="footer">
</html>
4.2 TESTING
The following are the Testing Methodologies:
o Unit Testing.
o Integration Testing.
o User Acceptance Testing.
o Output Testing.
o Validation Testing.
Unit Testing
Unit testing focuses verification effort on the smallest unit of Software design
that is the module. Unit testing exercises specific paths in a module’s control structure to
ensure complete coverage and maximum error detection. This test focuses on each
module individually, ensuring that it functions properly as a unit. Hence, the naming is Unit
Testing.
During this testing, each module is tested individually and the module
interfaces are verified for the consistency with design specification. All important processing
path are tested for the expected results. All error handling paths are also tested.
Integration Testing
Integration testing addresses the issues associated with the dual problems of
verification and program construction. After the software has been integrated a set of high
order tests are conducted. The main objective in this testing process is to take unit tested
modules and builds a program structure that has been dictated by design.
The following are the types of Integration Testing:
1.Top Down Integration

This method is an incremental approach to the construction of program structure.
Modules are integrated by moving downward through the control hierarchy, beginning with
the main program module. The module subordinates to the main program module are
incorporated into the structure in either a depth first or breadth first manner.
In this method, the software is tested from main module and individual stubs
are replaced when the test proceeds downwards.
2. Bottom-up Integration
This method begins the construction and testing with the modules at the lowest level
in the program structure. Since the modules are integrated from the bottom up, processing
required for modules subordinate to a given level is always available and the need for stubs is
eliminated. The bottom-up integration strategy may be implemented with the following steps:
 The low-level modules are combined into clusters into clusters that perform a specific
Software sub-function.
 A driver (i.e.) the control program for testing is written to coordinate test case input
and output.
 The cluster is tested.
 Drivers are removed and clusters are combined moving upward in the program
structure
The bottom-up approaches test each module individually and then each module is module is
integrated with a main module and tested for functionality.
User Acceptance Testing
User Acceptance of a system is the key factor for the success of any system. The
system under consideration is tested for user acceptance by constantly keeping in touch with
the prospective system users at the time of developing and making changes wherever
required. The system developed provides a friendly user interface that can easily be
understood even by a person who is new to the system.
Output Testing
After performing the validation testing, the next step is output testing of the proposed
system, since no system could be useful if it does not produce the required output in the
specified format. Asking the users about the format required by them tests the outputs
generated or displayed by the system under consideration. Hence the output format is
considered in 2 ways – one is on screen and another in printed format.
Validation Checking
Validation checks are performed on the following fields.
Text Field:
The text field can contain only the number of characters lesser than or equal to its
size. The text fields are alphanumeric in some tables and alphabetic in other tables. Incorrect
entry always flashes and error message.
Numeric Field:
The numeric field can contain only numbers from 0 to 9. An entry of any character
flashes an error message. The individual modules are checked for accuracy and what it has to
perform. Each module is subjected to test run along with sample data. The individually
tested modules are integrated into a single system. Testing involves executing the real data
information is used in the program the existence of any program defect is inferred from the
output. The testing should be planned so that all the requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data and
produces and output revealing the errors in the system.
Preparation of Test Data
Taking various kinds of test data does the above testing. Preparation of test
data plays a vital role in the system testing. After preparing the test data the system under
study is tested using that test data. While testing the system by using test data errors are again
uncovered and corrected by using above testing steps and corrections are also noted for future
use.
Using Live Test Data:

Live test data are those that are actually extracted from organization files. After a
system is partially constructed, programmers or analysts often ask users to key in a set of data
from their normal activities. Then, the systems person uses this data as a way to partially test
the system. In other instances, programmers or analysts extract a set of live data from the files
and have them entered themselves.
It is difficult to obtain live data in sufficient amounts to conduct extensive testing.

And, although it is realistic data that will show how the system will perform for the typical
processing requirement, assuming that the live data entered are in fact typical, such data
generally will not test all combinations or formats that can enter the system. This bias toward
typical values then does not provide a true systems test and in fact ignores the cases most
likely to cause system failure.
Using Artificial Test Data:
Artificial test data are created solely for test purposes, since they can be generated to
test all combinations of formats and values. In other words, the artificial data, which can
quickly be prepared by a data generating utility program in the information systems
department, make possible the testing of all login and control paths through the program.
The most effective test programs use artificial test data generated by persons other
than those who wrote the programs. Often, an independent team of testers formulates a
testing plan, using the systems specifications.
The package “Virtual Private Network” has satisfied all the requirements specified as
per software requirement specification and was accepted.
USER TRAINING
Whenever a new system is developed, user training is required to educate them about
the working of the system so that it can be put to efficient use by those for whom the system
has been primarily designed. For this purpose the normal working of the project was
demonstrated to the prospective users. Its working is easily understandable and since the
expected users are people who have good knowledge of computers, the use of this system is
very easy.
CHAPTER 5
RESULT ANALYSIS
5.1 OUTPUT-SCREENSHOTS
Screen No 2: Here the data user need to login by entering name and password
Screen No 3: Here the menu will be displayed to upload load, to check data integrity,
deduplication etc.
Screen No 4: This is cloud server page all the data uploaded by the owner will be stored in
cloud server.
Screen No 5: This is the end user page the end user can request file and can view the file
responses.
Screen No 6: Here in this page the file is divided into blocks in the encrypted form.
Screen No 6: Here this page shows the confirmation of uploaded data successfully.
Screen No 7: This page shows the encrypted deduplication found if we uploaded the same
file.
Screen No 8: Here in this page the end user can download the file by entering the file name
and secret key.
Screen No 9: In this page the user can decrypt and download the file requested by the end
user.
Screen No 10: Here the file contents are displayed which are to be downloaded and by
clicking the download button the file is downloaded.
CHAPTER 6
CONCLUSION
6.1 CONCLUSION
We present Metadedup, which exploits the power of indirection to realize de duplication to

metadata. It significantly mitigates the metadata storage overhead in encrypted de
duplication, while preserving confidentiality guarantees for both data and metadata. We
extend Metadedup for secure and fault-tolerant storage in a multi-server architecture; in
particular, we propose a distributed key management approach, which preserves security
guarantees even a number of servers are compromised. We extensively evaluate Metadedup
from performance and storage efficiency aspects. We show that Metadedup significantly
suppresses the storage space of metadata, while incurring limited performance overhead
compared to network speed.
6.2 FUTURE ENHANCEMENT

REFERENCES
[1] Boost c++ libraries. https://www.boost.org/.
[2] Leveldb. https://github.com/google/leveldb.
[3] Openssl. https://www.openssl.org.
[4] FSL traces and snapshots public archive. http://tracer.filesystems.
org/, 2014.
[5] F. Armknecht, J.-M. Bohli, G. O. Karame, and F. Youssef. Transparent
data deduplication in the cloud. In Proc. of ACM CCS,
2015.
[6] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson,
and D. Song. Provable data possession at untrusted stores. In
Proc. of ACM CCS, 2007.
[7] M. Bellare, S. Keelveedhi, and T. Ristenpart. DupLESS: Server-aided
encryption for deduplicated storage. In Proc. of USENIX Security,
2013.
[8] M. Bellare, S. Keelveedhi, and T. Ristenpart. Message-locked
encryption and secure deduplication. In Proc. of EUROCRYPT,
2013.
[9] D. Bhagwat, K. Eshghi, D. D. E. Long, and M. Lillibridge. Extreme
binning: Scalable, parallel deduplication for chunk-based file
backup. In Proc. of IEEE MASCOTS, 2009.
[10] J. Black. Compare-by-hash: A reasoned analysis. In Proc. of USENIX
ATC, 2006.
[11] D. R. Bobbarjung, S. Jagannathan, and C. Dubnicki. Improving
duplicate elimination in storage systems. ACM Transactions on
Storage, 2(4):424–448, 2006.
[12] A. Z. Broder. On the resemblance and containment of documents.
In Proc. of SEQUENCES, 1997.
[13] D. Dobre, P. Viotti, and M. Vukoli`c. Hybris: Robust hybrid cloud
storage. In Proc. of ACM SoCC, 2014.
[14] J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, and M. Theimer.
Reclaiming space from duplicate files in a serverless distributed
file system. In Proc. of IEEE ICDCS, 2002.
[15] Y. Duan. Distributed key generation for encrypted deduplication:
Achieving the strongest privacy. In Proc. of ACM CCSW, 2014.
[16] K. Eshghi and H. K. Tang. A framework for analyzing and
improving content-based chunking algorithms. Technical report,
HPL-2005-30R1, 2005. [17] S. Halevi, D. Harnik, B. Pinkas, and A. Shulman-Peleg. Proofs
of
ownership in remote storage systems. In Proc. of ACM CCS, 2011.
[18] D. Harnik, B. Pinkas, and A. Shulman-Peleg. Side channels in cloud
services: Deduplication in cloud storage. IEEE Security & Privacy,
8(6):40–47, 2010.
[19] M. Islam, M. Kuzu, and M. Kantarcioglu. Access pattern disclosure
on searchable encryption: Ramification, attack and mitigation. In
Proc. of NDSS, 2012.
[20] A. Juels and B. S. Kaliski. PORs: Proofs of retrievability for large
files. In Proc. of ACM CCS, 2007.
[21] E. Kruus, C. Ungureanu, and C. Dubnicki. Bimodal content defined
chunking for backup streams. In Proc. of USENIX FAST, 2010.
[22] J. Li, X. Chen, M. Li, J. Li, P. P. C. Lee, and W. Lou. Secure deduplication
with efficient and reliable convergent key management.
IEEE Transactions on Parallel Distributed Systems, 25(6):1615–1625,
2014.
[23] J. Li, P. P. C. Lee, Y. Ren, and X. Zhang. Metadedup: Deduplicating
metadata in encrypted deduplication via indirection. In Proc. of
IEEE MSST, 2019.
[24] J. Li, Z. Yang, Y. Ren, P. P. C. Lee, and X. Zhang. Balancing
storage efficiency and data confidentiality with tunable encrypted
deduplication. In Proc. of Eurosys, 2020.
[25] M. Li, C. Qin, and P. P. C. Lee. CDStore: Toward reliable, secure,
and cost-efficient cloud storage via convergent dispersal. In Proc.
of USENIX ATC, 2015.

Major

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Major

Uploaded by

Copyright:

Available Formats

CHAPTER 1

Chunk-based de duplication is widely used in modern primary and backup, storage

To provide confidentiality guarantees, encrypted deduplication adds an encryption layer

De duplication metadata is notoriously known to incur high storage overhead especially

We implement a Metadedup prototype and evaluate it performance in a networked setup.

Encrypted De duplication Storage De duplication is a technique for space-efficient data

Encrypted de duplication augments plain de duplication (i.e., de duplication without

In this paper, we focus on mitigating the metadata storage overhead in MLE-based

Mathematical analysis. We first model the metadata storage overhead in encrypted de

Metadedup builds on the idea of indirection. Instead of directly storing al deduplication

The system implements a Metadedup prototype and evaluate its performance in a

LITERATURE SURVEY AND PROBLEM IDENTIFICATION

2.1 LITERATURE SURVEY

1.Message-Locked Encryption and Secure Deduplication

They formalize a new cryptographic primitive, Message-Locked Encryption (MLE), where

2.Transparent data deduplication in the cloud

3.Server-aided encryption for deduplicated storage

4.Extreme binning: Scalable, parallel deduplication for chunk-based file

Data deduplication is an essential and critical component of backup systems. Essential,

5.Distributed key generation for encryption deduplication

2.2 SOFTWARE AND HARDWARE REQUIREMENTS

2.2.1 SOFTWARE REQUIREMENTS

 Operating System - Windows XP

 Front End - J2EE

 Back End - MySQL

2.2.2 HARDWARE REQUIEMENTS

 Processor - Pentium –IV

METHODOLOGY AND DESIGN ANALYSIS

3.1 SYSTEM ARCHITECTURE

Fig: Metadedup architecture

There are 5 modules are there in our project. They are:

1.Data Owner Module

2. Cloud Server Module

4. Data Encryption and Decryption

3.2 SYSTEM DESIGN

The Unified Modelling Language is a standard language for specifying, Visualization,

3.2.1 Class Diagram

The UML Class diagram is a graphical notation used to construct and visualize object-oriented

 operations (or methods),

3.2.2 Use Case Diagram

A sequence diagram in Unified Modelling Language (UML) is a kind of interaction diagram

Fig: Data Flow Diagram for Level 0

4.1 SOFTWARE ENVIRONMENT

What is Client Server:

Why Client Server:

Front End or User Interface Design :

Communication or Database Connectivity Tier :

The Communication architecture is designed by concentrating on the Standards of Servlets

In my project, I have chosen Java language for developing the code.

 Java is a programmer’s language.

 Java is cohesive and consistent.

Finally, Java is to Internet programming where C was to system programming.

Importance of Java to the Internet:

Java can be used to create two types of programs:

The Byte code:

Java Source Java byte code JavaVM

Compiling and interpreting Java Source Code:

The multi-platform environment of the Web places extraordinary demands on a program,

JavaScript is a script-based programming language that was developed by Netscape

<SCRIPT LANGUAGE = “JavaScript”>

Here are a few things we can do with JavaScript :

 ``Validate the contents of a form and make calculations.

<p class="infopost"> <a href="#" class="com"></a></p>