Professional Documents
Culture Documents
B 11-3
B 11-3
INTRODUCTION
1.1 INTRODUCTION
1
individual files. Since the key recipes contain sensitive key information, they need to
be managed separately from file recipes, encrypted by the master keys of file owners,
and individually stored for different file owners. Such high metadata storage overhead
can negate the storage effectiveness of encrypted de duplication in real deployment.
2
1.2 MOTIVATION
To defend against the offline brute-force attack, Dup LESS implements server-
aided MLE, which introduces a global secret and protects the key generation process
3
against public access. Server-aided MLE leverages a dedicated key manager to
maintain a global secret that is protected from an adversary. It derives the MLE key of
each chunk based on both the global secret and the cryptographic hash (a.k.a.,
fingerprint) of the chunk, such that an adversary cannot feasibly derive the MLE keys
of any sampled chunks without knowing the global secret. Thus, if the global secret is
secure, server aided MLE is robust against the offline brute-force attack, and achieves
security for both predictable and unpredictable chunks; otherwise, if the global secret
is compromised, it preserves the same security guarantees for unpredictable chunks as
in historical MLE. Dup LESS further implements two mechanisms to strengthen
security: (i) the oblivious pseudorandom function (OPRF) protocol, which allows the
client to submit the fingerprint of each chunk (to the key manager) in a blinded manner,
such that the key manager can successfully generate MLE keys without learning the
actual fingerprints; and (ii) the rate-limiting mechanism, which proactively controls the
key generation rate of the key manager, so as to defend the online brute-force attack
from the compromised clients that try to issue too many key generation requests.
Based on the above analysis, we show via an example how the metadata storage
overhead becomes problematic in encrypted de duplication. Suppose that the size of de
duplication metadata is 30 bytes [39], the size of key metadata is 32 bytes (e.g., for
AES-256 encryption keys), and the chunk size is 8KB [39]. Then f = 30 bytes 8KB _
0.0037 and k = 32 bytes 8KB _ 0.0039. If the de duplication factor (i.e., L=P) is 50_
4
[39] and L = 50 TB, then plain de duplication and encrypted de duplication incur
191.25GB and 391.25GB of metadata, or equivalently 18.67% and 38.21% additional
storage for 1 TB of physical data, respectively.
In existing system, it stores only single copy of duplicate chunks, while referencing
all duplicate chunks to the physical copy which reduces primary storage by 50% and
backup storage by 98%. So, we have proposed a encrypted deduplication which adds
an encryption layer to deduplication such that each chunk before being written to
deduplicated storage is encrypted via symmetric key encryption by a key derived from
the chunk content.
Chunk based deduplication is widely used in modern primary and backup storage
system to achieve high storage savings. It stores only a single physical copy of duplicate
chunks, while referencing all duplicate chunks to the physical copy by small size
references. Prior studies show that deduplication can effectively reduce the storage
space of primary storage by 50% and that of backup storage by up to 98%. To provide
confidentiality guarantees, encrypted deduplication adds an encryption layer to
deduplication, such that each chunk, before being written to deduplicated storage, is
deterministically encrypted via symmetric key encryption by a derived from the chunk
content (Ex: the key is set to be cryptographic hash of chunk content). This ensures that
duplicate chunks have identical content even after encryption, and hence we can still
apply deduplication to the encrypted chunks for storage savings. Many studies have
designed various encrypted deduplication schemes to efficiently manage out sourced
data in cloud storage. In addition to storing non duplicate data, a deduplicated storage
system needs to keep deduplication metadata.
Disadvantages:
• Metadedup is not designed for an organization that outsources the storage of users’
data to a remote shared storage system.
• There is less efficiency on the cloud data due to High storage overhead of metadata.
5
1.5 PROPOSED SYSTEM
Finally, the system conducts trace-driven simulation on two real world datasets.
We show that Metadedup achieves up to 93.94% of metadata storage savings in
encrypted deduplication. We also show that Metadedup maintains the storage load
balance among all servers.
Advantages:
• The system ensures that our Metadedup design is compatible with existing
countermeasures that address different threats against encrypted deduplication
systems.
• A malicious client may abuse client-side deduplication to learn whether other users
have already stored certain files, and Metadedup can defeat the side channel leakage
6
by adopting the server-side deduplication on cross-user data. A malicious server
may modify or even delete stored files to destroy the availability of outsourced files,
and Metadedup is compatible with the availability countermeasure that disperses
data across servers via deduplication-aware secret sharing.
7
CHAPTER 2
LITERATURE SURVEY
Nowadays, many customers and enterprises backup their data to cloud storage that
performs deduplication to save storage space and network bandwidth. Hence, how to
perform secure deduplication becomes a critical challenge for cloud storage. According
to our analysis, the state-of-the-art secure deduplication methods are not suitable for
cross-user finegrained data deduplication. They either suffer brute-force attacks that
can recover files falling into a known set, or incur large computation (time) overheads.
Moreover, existing approaches of convergent key management incur large space
overheads because of the huge number of chunks shared among users. Our observation
that cross-user redundant data are mainly from the duplicate files, motivates us to
propose an efficient secure deduplication scheme SecDep. SecDep employs User-
Aware Convergent Encryption (UACE) and Multi-Level Key management (MLK)
approaches.
8
of the art on deduplication help identify and understand the most important design
considerations for data deduplication systems. In addition, we discuss the main
applications and industry trend of data deduplication, and provide a list of the publicly
available sources for deduplication research and studies. Finally, we outline the open
problems and future research directions facing deduplication-based storage systems.
4. J. Li, X. Chen, M. Li, J. Li, P. P. C. Lee, and W. Lou. Secure deduplication with
efficient and reliable convergent key management. IEEE Transactions on Parallel
Distributed Systems, 25(6):1615–1625, 2014.
Data deduplication is a technique for eliminating duplicate copies of data, and has been
widely used in cloud storage to reduce storage space and upload bandwidth. Promising
as it is, an arising challenge is to perform secure deduplication in cloud storage.
Although convergent encryption has been extensively adopted for secure deduplication,
a critical issue of making convergent encryption practical is to efficiently and reliably
manage a huge number of convergent keys. This paper makes the first attempt to
9
formally address the problem of achieving efficient and reliable key management in
secure deduplication. We first introduce a baseline approach in which each user holds
an independent master key for encrypting the convergent keys and outsourcing them to
the cloud. However, such a baseline key management scheme generates an enormous
number of keys with the increasing number of users and requires users to dedicatedly
protect the master keys. To this end, we propose Dekey , a new construction in which
users do not need to manage any keys on their own but instead securely distribute the
convergent key shares across multiple servers. Security analysis demonstrates that
Dekey is secure in terms of the definitions specified in the proposed security model. As
a proof of concept, we implement Dekey using the Ramp secret sharing scheme and
demonstrate that Dekey incurs limited overhead in realistic environments.
Large-scale storage systems often attempt to achieve two seemingly conflicting goals:
the systems need to reduce the copies of redundant data to save space, a process called
deduplication; and users demand encryption of their data to ensure privacy.
Conventional encryption makes deduplication on ciphertexts ineffective, as it destroys
data redundancy. A line of work, originated from Convergent Encryption, and evolved
into Message Locked Encryption, strives to solve this problem. The latest work,
DupLESS, proposes a server-aided architecture that provides the strongest privacy. The
DupLESS architecture relies on a key server to help the clients generate encryption
keys that result in convergent ciphertexts. In this paper, we first provide a rigorous proof
of security, in the random oracle model, for the DupLESS architecture which is lacking
in the original paper. Our proof shows that using additional secret, other than the data
itself, for generating encryption keys achieves the best possible security under current
deduplication paradigm. We then introduce a distributed protocol that eliminates the
need for a key server and allows less managed systems such as P2P systems to enjoy
the high security level. Implementation and evaluation show that the scheme is both
robust and practical.
10
2.2 SOFTWARE AND HARDWARE REQUIREMENTS
➢ RAM - 4 GB (min)
➢ Hard Disk - 20 GB
➢ Monitor - SVGA
11
CHAPTER 3
METHODOLOGY AND DESIGN ANALYSIS
Here the data is divided into chunks and the chunks are highly redundant in real world
so they need to be deduplicated. And for every chunk a secret key is assigned. And
dedup algorithm will deduplicate the chunks. And the file recipes and key recipes will
be used for the reconstruction of the back-up files.
Metadedup is designed for an organization that outsources the storage of users’ data to
a remote shared storage system. It focuses on the storage of backup workloads, which
are known to have high content similarity. It applies deduplication to remove content
redundancies of both data (i.e., the file data from users’ backup workloads) and
metadata (i.e., deduplication metadata and key metadata), so as to improve the overall
storage efficiency. Metadedup builds on the client-server model. A client is a software
interface for users to process backup files. It partitions a backup file into multiple
chunks, encrypts each chunk, and uploads the encrypted chunks to a remote server that
employs encrypted deduplication. The server performs chunk-based deduplication, and
only stores the non-duplicate chunks that have unique content with existing chunks. It
also keeps the file recipe and key recipe of each backup file (note that the key recipe is
further encrypted by the master key of the client that owns the backup file) for the
reconstruction of the backup file. Here, we assume that the communication channels
are carefully protected (e.g., via SSL/TLS), so as to address network eavesdropping. To
this end, Metadedup aims for the following goals:
12
➢ Low storage overhead of metadata: It suppresses the storage space of both file
recipes and key recipes, while incurring only small storage overhead to the
fingerprint index as we also apply deduplication to metadata.
➢ Security for data and metadata: It preserves the security guarantees of underlying
encrypted deduplication for both data and metadata storage.
➢ Limited performance overhead: It adds small performance overhead on writing
(restoring) files to (from) deduplicated storage. In the following, we define the
threat model of Metadedup, and present its metadata deduplication design in details.
We also present the security analysis of Metadedup on the protection of metadata
chunks in Section 1 of the digital supplementary file.
13
3.2 ALGORTHMS
14
Algorithm 2: Restore operation
Receive name
Send fRecipe, {[mChunk]mkey} and [kRecipe]key Client input: client’s master key
key
Receive {[dChunk]dkey}
15
3.3 MODULES:
1. Data Owner
2. Cloud Server
3. End User
4. Data Encryption and Decryption
5. Attacker
In this module, the data owner uploads their data in the cloud server. For the security
purpose the data owner encrypts the data file’s blocks and then store in the cloud. The
data owner can check the duplication of the file’s blocks over Corresponding cloud
server. The Data owner can have capable of manipulating the encrypted data file’s
blocks and the data owner can check the cloud data as well as the duplication of the
specific file’s blocks and also he can create remote user with respect to registered cloud
servers. The data owner also checks data integrity proof on which the block is modified
by the attacker.
The cloud service provider manages a cloud to provide data storage service. Data
owners encrypt their data file’s blocks and store them in the cloud for sharing with
Remote User. To access the shared data file’s blocks, data consumers download
encrypted data file’s blocks of their interest from the cloud and then decrypt them.
3. End User
In this module, remote user logs in by using his username and password. After he will
request for secrete key of required file’s blocks from cloud servers, and get the secrete
key. After getting secrete key he is trying to download file’s blocks by entering file’s
blocks name and secrete key from cloud server.
16
4. Data Encryption and Decryption
All the legal users in the system can freely query any interested encrypted and decrypted
data. Upon receiving the data from the server, the user runs the decryption algorithm
Decrypt to decrypt the cipher text by using its secret keys from different Users. Only
the attributes the user possesses satisfy the access structure defined in the cipher text
CT, the user can get the content key.
5.Attacker Module
The user who attacks or modifies the block content called attacker. The attacker may
the user who tries to access the file contents by wrong secret key from the cloud server.
17
3.4 SYSTEM DESIGN
INPUT DESIGN:
Input Design plays a vital role in the life cycle of software development, it requires very
careful attention of developers. The input design is to feed data to the application as
accurate as possible. So inputs are supposed to be designed effectively so that the errors
occurring while feeding are minimized. According to Software Engineering Concepts,
the input forms or screens are designed to provide to have a validation control over the
input limit, range and other related validations.
This system has input screens in almost all the modules. Error messages are developed
to alert the user whenever he commits some mistakes and guides him in the right way
so that invalid entries are not made. Let us see deeply about this under module design.
Input design is the process of converting the user created input into a computer-based
format. The goal of the input design is to make the data entry logical and free from
errors. The error is in the input are controlled by the input design. The application has
been developed in user-friendly manner. The forms have been designed in such a way
during the processing the cursor is placed in the position where must be entered. The
user is also provided with in an option to select an appropriate input from various
alternatives related to the field in certain cases.
Validations are required for each data entered. Whenever a user enters an erroneous
data, error message is displayed and the user can move on to the subsequent pages after
completing all the entries in the current page.
OUTPUT DESIGN:
The Output from the computer is required to mainly create an efficient method of
communication within the company primarily among the project leader and his team
members, in other words, the administrator and the clients. The output of VPN is the
system which allows the project leader to manage his clients in terms of creating new
clients and assigning new projects to them, maintaining a record of the project validity
and providing folder level access to each client on the user side depending on the
projects allotted to him. After completion of a project, a new project may be assigned
to the client. User authentication procedures are maintained at the initial stages itself.
18
A new user may be created by the administrator himself or a user can himself register
as a new user but the task of assigning projects and validating a new user rests with the
administrator only.
The application starts running when it is executed for the first time. The server has to
be started and then the internet explorer in used as the browser. The project will run on
the local area network so the server machine will serve as the administrator while the
other connected systems can act as the clients. The developed system is highly user
friendly and can be easily understood by anyone using it even for the first time.
UML stands for Unified Modelling Language. UML is a standardized general purpose
modelling language in the field of object-oriented software engineering. The standard
is managed, and was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object-
oriented computer software. In its current form UML is comprised of two major
components: a Meta-model and a notation. In the future, some form of method or
process may also be added to; or associated with, UML.
The UML represents a collection of best engineering practices that have proven
successful in the modelling of large and complex systems.
The UML is a very important part of developing object-oriented software and the
software development process. The UML uses mostly graphical notations to express
the design of software projects.
GOALS: The Primary goals in the design of the UML are as follows:
19
4. Provide a formal basis for understanding the modelling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks,
patterns and components.
7. Integrate best practices.
The UML Class diagram is a graphical notation used to construct and visualize object-
oriented systems. A class diagram in the Unified Modelling Language (UML) is a type
of static structure diagram that describes the structure of a system by showing the
system's:
• classes,
• their attributes,
• operations (or methods),
• the relationships among objects.
20
Figure 3.2: Class diagram
The figure represents the class diagram of the project in this we can see cloud server,
data owner, end user, attacker.
The data owner, the owner initially register the account by filling details the list and
after completion of register he can easily login to the account and after login the owner
he may upload the data in the cloud server for the security. Data owner may upload the
data in form of blocks so he can easily encrypt the data.
The cloud server also register and login to the account by filling the details.
A use case diagram in the Unified Modelling Language (UML) is a type of behavioural
diagram defined by and created from a Use-case analysis. Its purpose is to present a
graphical overview of the functionality provided by a system in terms of actors, their
goals (represented as use cases), and any dependencies between those use cases. The
main purpose of a use case diagram is to show what system functions are performed for
which actor. Roles of the actors in the system can be depicted.
Figure 3.3: Use Case diagram for Data Owner and Cloud Server
21
The use case diagram represents that data owner must and register and login and browse
files split into the blocks and then encryption and upload he may check for data
deduplication and data integrity. If the data is duplicate data, then it displays
deduplication found.
Figure 3.4: Use case diagram for End user and Cloud Server
This is use case diagram for end user and cloud server. The end user and cloud sever
has to login. Then end user will request for the file which he want. Then the cloud server
will view the file requests and respond for the requests. That means cloud server will
give permission to access the files. Then the end user can able to download the files.
22
Figure 3.5: Sequence Diagram
First the data owner will register and login. Data owner will upload the file, then the
cloud server will divide the uploaded file into blocks and for each block server will give
a mac and the file is encrypted. The end user will request for the file then the server
will respond for the file request. The end user will be able to download the file.
1. The DFD is also called as bubble chart. It is a simple graphical formalism that can
be used to represent a system in terms of input data to the system, various processing
carried out on this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used
to model the system components. These components are the system process, the
data used by the process, an external entity that interacts with the system and the
information flows in the system.
3. DFD shows how the information moves through the system and how it is modified
by a series of transformations. It is a graphical technique that depicts information
flow and the transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at
any level of abstraction. DFD may be partitioned into levels that represent
increasing information flow and functional detail.
23
Figure 3.6: Data Flow Diagram for Level 0
This is Level 0 DFD. First the data owner has to get register and login. Then he will
upload the file. If the data owner wants to update the file then he will able to update the
file.
24
Figure 3.7: Data Flow Diagram for Level 1
Data owner has to get register. After registration he has to login. The data owner will
upload the file then the file sent to cloud server. Whenever the user want the file, then
end user has to send the request for the file then the cloud server will respond for the
file request. Cloud server will give permission to access the file. Then the end user will
be able to download the file.
Flowcharts are nothing but the graphical representation of the data or the algorithm
for a better understanding of the code visually. It displays step-by-step solutions to a
problem, algorithm, or process. It is a pictorial way of representing steps that are
preferred by most beginner-level programmers to understand algorithms of computer
science, thus it contributes to troubleshooting the issues in the algorithm. A flowchart
is a picture of boxes that indicates the process flow in a sequential manner. Since a
flowchart is a pictorial representation of a process or algorithm, it’s easy to interpret
and understand the process. To draw a flowchart, certain rules need to be followed
25
which are followed by all professionals to draw a flowchart and is widely accepted
all over the countries.
Use of a flowchart
Types of Flowcharts
1. Process flowchart: This type of flowchart shows all the activities that are
involved in making a product. It basically provides a pathway to analyse the
product to be built. A process flowchart is most commonly used in process
26
engineering to illustrate the relation between the major as well as minor
components present in the product. It is used in business product modelling to
help understand employees about the project requirements and gain some insight
about the project.
2. Data flowchart: As the name suggests, the data flowchart is used to analyse the
data, specifically it helps in analysing the structural details related to the project.
Using this flowchart, one can easily understand the data inflow and outflow from
the system. It is most commonly used to manage data or to analyse information to
and from the system.
3. Business Process Modelling Diagram: Using this flowchart or diagram, one can
analytically represent the business process and help simplify the concepts needed
to understand business activities and the flow of information. This flowchart
illustrates the business process and models graphically which paves a way for
process improvement.
Most flowcharts use only a simplified set of symbols with a single type of relationship,
a solid line with an arrow. Diagrams should begin and ideally end with the terminator
symbol. However, this rule is not as strict as it was in the UML Activity diagram. The
terminator is usually drawn as a rectangle with distinctly round corners. In some
diagrams, however, you may also encounter an ellipse or circle. The terminator where
the process flow starts usually contains the "Start" text and the one where the process
ends the "End" text. Some diagrams have multiple end points and some none.
Advantages of Flowchart
27
Figure 3.8: Flow chart for Data owner
In the above flow chart, it shows the sequence of steps performed in the project. Data
owner and user has to login to the cloud to perform the operations.
3.4.4 ER Diagram
ER Diagram stands for Entity Relationship Diagram, also known as ERD is a diagram
that displays the relationship of entity sets stored in a database. In other words, ER
diagrams help to explain the logical structure of databases. ER diagrams are created
based on three basic concepts: entities, attributes and relationships.
28
ER Diagrams contain different symbols that use rectangles to represent entities, ovals
to define attributes and diamond shapes to represent relationships.
ER Model
ER Model stands for Entity Relationship Model is a high-level conceptual data model
diagram. ER model helps to systematically analyze data requirements to produce a well-
designed database. The ER Model represents real-world entities and the relationships
between them. Creating an ER Model in DBMS is considered as a best practice before
implementing your database.
ER Modeling helps you to analyze data requirements systematically to produce a well-
designed database. So, it is considered a best practice to complete ER modeling before
implementing your database.
29
Figure 3.9: E-R Diagram
30
CHAPTER 4
IMPLEMENTATION
4.1 INTRODUCTION
But if we choose Cloud Computing, a cloud vendor is responsible for the hardware
purchase and maintenance. They also provide a wide variety of software and platform
as a service. We can take any required services on rent. The cloud computing services
will be charged based on usage.
The cloud environment provides an easily accessible online portal that makes handy for
the user to manage the compute, storage, network, and application resources.
31
• Security: Many cloud vendors offer a broad set of policies, technologies, and
controls that strengthen our data security.
• Public Cloud: The cloud resources that are owned and operated by a third-party
cloud service provider are termed as public clouds. It delivers computing resources
such as servers, software, and storage over the internet
• Private Cloud: The cloud computing resources that are exclusively used inside a
single business or organization are termed as a private cloud. A private cloud may
physically be located on the company’s on-site datacentre or hosted by a third-party
service provider.
• Hybrid Cloud: It is the combination of public and private clouds, which is bounded
together by technology that allows data applications to be shared between them.
Hybrid cloud provides flexibility and more deployment options to the business.
32
4.2 SOFTWARE ENVIRONMENT
Client server:
Overview:
With the varied topic in existence in the fields of computers, Client Server is one, which
has generated more heat than light, and also more hype than reality. This technology
has acquired a certain critical mass attention with its dedication conferences and
magazines. Major computer vendors such as IBM and DEC, have declared that Client
Servers is their main future market. A survey of DBMS magazine revealed that 76% of
its readers were actively looking at the client server solution. The growth in the client
server development tools from $200 million in 1992 to more than $1.2 billion in 1996.
Client server implementations are complex but the underlying concept is simple and
powerful. A client is an application running with local resources but able to request the
database and relate the services from separate remote server. The software mediating
this client server interaction is often referred to as MIDDLEWARE.
The typical client either a PC or a Work Station connected through a network to a more
powerful PC, Workstation, Midrange or Main Frames server usually capable of
handling request from more than one client. However, with some configuration server
may also act as client. A server may need to access other server in order to process the
original client request.
The key client server idea is that client as user is essentially insulated from the physical
location and formats of the data needs for their application. With the proper
middleware, a client input from or report can transparently access and manipulate both
local database on the client machine and remote databases on one or more servers. An
added bonus is the client server opens the door to multi-vendor database access
indulging heterogeneous table joins.
Two prominent systems in existence are client server and file server systems. It is
essential to distinguish between client servers and file server systems. Both provide
shared network access to data but the comparison dens there! The file server simply
provides a remote disk drive that can be accessed by LAN applications on a file by file
33
basis. The client server offers full relational database services such as SQL-Access,
Record modifying, Insert, Delete with full relational integrity backup/ restore
performance for high volume of transactions, etc. the client server middleware provides
a flexible interface between client and server, who does what, when and to whom.
Client server has evolved to solve a problem that has been around since the earliest days
of computing: how best to distribute your computing, data generation and data storage
resources in order to obtain efficient, cost effective departmental an enterprise wide
data processing. During mainframe era choices were quite limited. A central machine
housed both the CPU and DATA (cards, tapes, drums and later disks). Access to these
resources was initially confined to batched runs that produced departmental reports at
the appropriate intervals. A strong central information service department ruled the
corporation. The role of the rest of the corporation limited to requesting new or more
frequent reports and to provide hand written forms from which the central data banks
were created and updated. The earliest client server solutions therefore could best be
characterized as “SLAVE-MASTER”.
34
About Java:
Initially the language was called as “oak” but it was renamed as “Java” in 1995. The
primary motivation of this language was the need for a platform-independent (i.e.,
architecture neutral) language that could be used to create software to be embedded in
various consumer electronic devices.
➢ Except for those constraints imposed by the Internet environment, Java gives the
programmer, full control.
Java has had a profound effect on the Internet. This is because; Java expands the
Universe of objects that can move about freely in Cyberspace. In a network, two
categories of objects are transmitted between the Server and the Personal computer.
They are: Passive information and Dynamic active programs. The Dynamic, Self-
executing programs cause serious problems in the areas of Security and probability.
But, Java addresses those concerns and by doing so, has opened the door to an exciting
new form of program called the Applet.
Applications and Applets: An application is a program that runs on our Computer under
the operating system of that computer. It is more or less like one creating using C or
C++. Java’s ability to create Applets makes it important. An Applet is an application
designed to be transmitted over the Internet and executed by a Java –compatible web
browser. An applet is actually a tiny Java program, dynamically downloaded across the
network, just like an image. But the difference is, it is an intelligent program, not just a
media file. It can react to the user input and dynamically change.
Features of Java:
Security:
35
Every time you that you download a “normal” program, you are risking a viral
infection. Prior to Java, most users did not download executable programs frequently,
and those who did scanned them for viruses prior to execution. Most users still worried
about the possibility of infecting their systems with a virus. In addition, another type of
malicious program exists that must be guarded against. This type of program can gather
private information, such as credit card numbers, bank account balances, and
passwords. Java answers both these concerns by providing a “firewall” between a
network application and your computer.
When you use a Java-compatible Web browser, you can safely download Java applets
without fear of virus infection or malicious intent.
Portability:
The key that allows the Java to solve the security and portability problems is that the
output of Java compiler is Byte code. Byte code is a highly optimized set of instructions
designed to be executed by the Java run-time system, which is called the Java Virtual
Machine (JVM). That is, in its standard form, the JVM is an interpreter for byte code.
Translating a Java program into byte code helps makes it much easier to run a program
in a wide variety of environments. The reason is, once the run-time package exists for
a given system, any Java program can run on it.
Although Java was designed for interpretation, there is technically nothing about Java
that prevents on-the-fly compilation of byte code into native code. Sun has just
completed its Just In Time (JIT) compiler for byte code. When the JIT compiler is a
part of JVM, it compiles byte code into executable code in real time, on a piece-by-
piece, demand basis. It is not possible to compile an entire Java program into executable
code all at once, because Java performs various run-time checks that can be done only
at run time. The JIT compiles code, as it is needed, during execution.
36
Java Virtual Machine (JVM):
Beyond the language, there is the Java virtual machine. The Java virtual machine is an
important element of the Java technology. The virtual machine can be embedded within
a web browser or an operating system. Once a piece of Java code is loaded onto a
machine, it is verified. As part of the loading process, a class loader is invoked and does
byte code verification makes sure that the code that’s has been generated by the
compiler will not corrupt the machine that it’s loaded on. Byte code verification takes
place at the end of the compilation process to make sure that is all accurate and correct.
So byte code verification is integral to the compiling and executing of Java code.
Overall Description
Java byte
Java Jav
Java .Class
Picture showing the development process of JAVA Program
Java programming uses to produce byte codes and executes them. The first box
indicates that the Java source code is located in a. Java file that is processed with a Java
compiler called javac. The Java compiler produces a file called a. class file, which
contains the byte code. The. Class file is then loaded across the network or loaded
locally on your machine into the execution environment is the Java virtual machine,
which interprets and executes the byte code.
Java Architecture:
Compilation of code:
When you compile the code, the Java compiler creates machine code (called byte code)
for a hypothetical machine called Java Virtual Machine (JVM). The JVM is supposed
to execute the byte code. The JVM is created for overcoming the issue of portability.
37
The code is written and compiled for one machine and interpreted on all machines. This
machine is called Java Virtual Machine.
Java
PC Java Interpreter
Source Compile
(PC)
Code Byte code
Macintosh Java
………..
Compiler Interpreter
………..
(Macintosh)
(Platform
SPARC
indepen Java
Com Interpreter
(Sparc)
During run-time the Java interpreter tricks the byte code file into thinking that it is
running on a Java Virtual Machine. In reality this could be a Intel Pentium Windows
95 or Sun SARC station running Solaris or Apple Macintosh running system and all
could receive code from any computer through Internet and run the Applets.
Simple:
Java was designed to be easy for the Professional programmer to learn and to use
effectively. If you are an experienced C++ programmer, learning Java will be even
easier. Because Java inherits the C/C++ syntax and many of the object oriented features
of C++. Most of the confusing concepts from C++ are either left out of Java or
implemented in a cleaner, more approachable manner. In Java there are a small number
of clearly defined ways to accomplish a given task.
Object-Oriented:
Java was not designed to be source-code compatible with any other language. This
allowed the Java team the freedom to design with a blank slate. One outcome of this
was a clean usable, pragmatic approach to objects. The object model in Java is simple
38
and easy to extend, while simple types, such as integers, are kept as high-performance
non-objects.
Robust:
Java virtually eliminates the problems of memory management and deallocation, which
is completely automatic. In a well-written Java program, all run time errors can –and
should –be managed by your program.
JAVASCRIPT:
<SCRIPTS>..</SCRIPT>.
JavaScript statements
</SCRIPT>
39
➢ ``Validate the contents of a form and make calculations.
➢ Add scrolling or changing messages to the Browser’s status line.
➢ Animate images or rotate images that change when we move the mouse over
them.
➢ Detect the browser in use and display different content for different browsers.
➢ Detect installed plug-ins and notify the user if a plug-in is required.
JavaScript and Java are entirely different languages. A few of the most glaring
differences are:
➢ Java applets are generally displayed in a box within the web document; JavaScript
can affect any part of the Web document itself.
➢ While JavaScript is best suited to simple applications and adding interactive
features to Web pages; Java can be used for incredibly complex applications.
There are many other differences but the important thing to remember is that
JavaScript and Java are separate languages. They are both useful for different things;
in fact they can be used together to combine their advantages.
ADVANTAGES:
➢ JavaScript is the default scripting languages at Client-side since all the browsers
supports it.
Hypertext Markup Language (HTML), the languages of the World Wide Web (WWW),
allows users to produces Web pages that include text, graphics and pointer to other Web
pages (Hyperlinks).
40
adapted to the Web. The idea behind Hypertext is that instead of reading text in rigid
linear structure, we can easily jump from one point to another point. We can navigate
through the information based on our interest and preference. A markup language is
simply a series of elements, each delimited with special characters that define how text
or other items enclosed within the elements should be displayed. Hyperlinks are
underlined or emphasized works that load to other documents or some portions of the
same document.
HTML can be used to display any type of document on the host computer, which can
be geographically at a different location. It is a versatile language and can be used on
any platform or desktop.
HTML provides tags (special codes) to make the document look attractive. HTML tags
are not case-sensitive. Using graphics, fonts, different sizes, color, etc., can enhance the
presentation of the document. Anything that is not a tag is part of the document itself.
ADVANTAGES:
➢ A HTML document is small and hence easy to send over the net. It is small because
it does not include formatted information.
➢ HTML is platform independent.
➢ HTML tags are not case-sensitive.
What Is JDBC?
JDBC is a Java API for executing SQL statements. (As a point of interest, JDBC is a
trademarked name and is not an acronym; nevertheless, JDBC is often thought of as
standing for Java Database Connectivity. It consists of a set of classes and interfaces
written in the Java programming language. JDBC provides a standard API for
tool/database developers and makes it possible to write database applications using a
pure Java API.
Using JDBC, it is easy to send SQL statements to virtually any relational database. One
can write a single program using the JDBC API, and the program will be able to send
SQL statements to the appropriate database. The combinations of Java and JDBC lets
a programmer write it once and run it anywhere.
41
4.3 CODE:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<style type="text/css">
<!--
.style1 {
font-size: 24px;
color: #FF0000;
font-weight: bold;
42
}
.style3 {
color: #FF0000;
font-weight: bold;
.style5 {
color: #0000FF;
font-weight: bold;
font-style: italic;
-->
</style>
</head>
<body>
<div class="main">
<div class="header">
<div class="header_resize">
<div class="menu_nav">
<ul>
43
<li><a href="EndUser.jsp"><span>End User </span></a></li>
</ul>
</div>
<div class="clr"></div>
<div class="logo">
</div>
<div class="clr"></div>
<div class="slider">
<div class="clr"></div>
</div>
<div class="clr"></div>
</div>
</div>
<div class="content">
<div class="content_resize">
<div class="mainbar">
44
<div class="article">
<div class="clr"></div>
<div class="post_content">
<ol>
<li>
</li>
<li> <strong>
</strong>
<label for="email"></label>
</li><li></li>
<li></li>
<li><br />
<a href="ERegister.html">REGISTER</a>
45
<input type="submit" name="imageField" id="imageField" class="LOGIN"
/>
</li>
</ol>
</form>
</div>
<div class="clr"></div>
</div>
</div>
<div class="sidebar">
<div class="searchform">
<span>
</span>
</form>
</div>
<div class="clr"></div>
<div class="gadget">
46
<div class="clr"></div>
<ul class="sb_menu">
<li><a href="index.html">Home</a></li>
</ul>
</div>
</div>
<div class="clr"></div>
</div>
</div>
<div class="fbg"></div>
<div class="footer">
</html>
47
4.4 TESTING
• Unit Testing.
• Integration Testing.
• User Acceptance Testing.
• Output Testing.
• Validation Testing.
Unit Testing
Unit testing focuses verification effort on the smallest unit of Software design
that is the module. Unit testing exercises specific paths in a module’s control structure
to ensure complete coverage and maximum error detection. This test focuses on each
module individually, ensuring that it functions properly as a unit. Hence, the naming is
Unit Testing.
During this testing, each module is tested individually and the module interfaces
are verified for the consistency with design specification. All important processing path
are tested for the expected results. All error handling paths are also tested.
Integration Testing
Integration testing addresses the issues associated with the dual problems of
verification and program construction. After the software has been integrated a set of
high order tests are conducted. The main objective in this testing process is to take unit
tested modules and builds a program structure that has been dictated by design.
48
In this method, the software is tested from main module and individual stubs
are replaced when the test proceeds downwards.
2. Bottom-up Integration
This method begins the construction and testing with the modules at the lowest
level in the program structure. Since the modules are integrated from the bottom up,
processing required for modules subordinate to a given level is always available and
the need for stubs is eliminated. The bottom-up integration strategy may be
implemented with the following steps:
• The low-level modules are combined into clusters into clusters that perform a
specific Software sub-function.
• A driver (i.e.) the control program for testing is written to coordinate test case input
and output.
• Drivers are removed and clusters are combined moving upward in the program
structure
• The bottom-up approaches test each module individually and then each module is
module is integrated with a main module and tested for functionality.
User Acceptance of a system is the key factor for the success of any system.
The system under consideration is tested for user acceptance by constantly keeping in
touch with the prospective system users at the time of developing and making changes
wherever required. The system developed provides a friendly user interface that can
easily be understood even by a person who is new to the system.
Output Testing
After performing the validation testing, the next step is output testing of the
proposed system, since no system could be useful if it does not produce the required
output in the specified format. Asking the users about the format required by them tests
49
the outputs generated or displayed by the system under consideration. Hence the output
format is considered in 2 ways – one is on screen and another in printed format.
Validation Checking
Text Field:
The text field can contain only the number of characters lesser than or equal to
its size. The text fields are alphanumeric in some tables and alphabetic in other tables.
Incorrect entry always flashes and error message.
Numeric Field:
The numeric field can contain only numbers from 0 to 9. An entry of any
character flashes an error message. The individual modules are checked for accuracy
and what it has to perform. Each module is subjected to test run along with sample
data. The individually tested modules are integrated into a single system. Testing
involves executing the real data information is used in the program the existence of any
program defect is inferred from the output. The testing should be planned so that all
the requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data and
produces and output revealing the errors in the system.
Taking various kinds of test data does the above testing. Preparation of
test data plays a vital role in the system testing. After preparing the test data the system
under study is tested using that test data. While testing the system by using test data
errors are again uncovered and corrected by using above testing steps and corrections
are also noted for future use.
Live test data are those that are actually extracted from organization files. After
a system is partially constructed, programmers or analysts often ask users to key in a
set of data from their normal activities. Then, the systems person uses this data as a way
50
to partially test the system. In other instances, programmers or analysts extract a set of
live data from the files and have them entered themselves.
Artificial test data are created solely for test purposes, since they can be
generated to test all combinations of formats and values. In other words, the artificial
data, which can quickly be prepared by a data generating utility program in the
information systems department, make possible the testing of all login and control paths
through the program.
The most effective test programs use artificial test data generated by persons
other than those who wrote the programs. Often, an independent team of testers
formulates a testing plan, using the systems specifications.
The package “Virtual Private Network” has satisfied all the requirements
specified as per software requirement specification and was accepted.
USER TRAINING
51
4.5 TEST CASES
Table 4.5.1 Test Cases
52
CHAPTER 5
RESULT ANALYSIS
5.1 OUTPUT-SCREENSHOTS
Figure 5.2: Here the data user need to login by entering name and password
53
Figure 5.3: Here the menu will be displayed to upload load, to check data integrity,
deduplication etc.
Figure 5.4: This is cloud server page all the data uploaded by the owner will be stored
in cloud server.
54
Figure 5.5: This is the end user page the end user can request file and can view the file
responses.
Figure 5.6: Here in this page the file is divided into blocks in the encrypted form.
55
Figure 5.7: Here this page shows the confirmation of uploaded data successfully.
Figure 5.8: This page shows the encrypted deduplication found if we uploaded the
same file.
56
Figure 5.9: Here in this page the end user can download the file by entering the file
name and secret key.
Figure 5.10: In this page the user can decrypt and download the file requested by the
end user.
57
Figure 5.11: Here the file contents are displayed which are to be downloaded and by
clicking the download button the file is downloaded.
58
CHAPTER 6
CONCLUSION
6.1 CONCLUSION
The main concern with the cloud is security. This security issue has been solved to
some extent in this project, the enhancement can be made such that the integration of
various cryptographic algorithms would provide a better way to keep encrypt our data.
The main issue is that when complexity increases the performance is disturbed. The
various enciphering techniques through its algorithms can also be updated and
transformed and used in various decentralized platforms like blockchain other than only
using centralized platforms such as cloud. In our project, the main drawback is end user
can attack the files. This may be change in future by using different techniques.
59
REFERENCES
[1] Boost c++ libraries. https://www.boost.org/.
[2] Leveldb. https://github.com/google/leveldb.
[3] Openssl. https://www.openssl.org.
[4] FSL traces and snapshots public archive. http://tracer.filesystems. org/, 2014.
[5] F. Armknecht, J.-M. Bohli, G. O. Karame, and F. Youssef. Transparent data
deduplication in the cloud. In Proc. of ACM CCS, 2015.
[6] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson, and D.
Song. Provable data possession at untrusted stores. In Proc. of ACM CCS, 2007.
[7] M. Bellare, S. Keelveedhi, and T. Ristenpart. DupLESS: Server-aided encryption
for deduplicated storage. In Proc. of USENIX Security, 2013.
[8] M. Bellare, S. Keelveedhi, and T. Ristenpart. Message-locked encryption and secure
deduplication. In Proc. of EUROCRYPT, 2013.
[9] D. Bhagwat, K. Eshghi, D. D. E. Long, and M. Lillibridge. Extreme binning:
Scalable, parallel deduplication for chunk-based file backup. In Proc. of IEEE
MASCOTS, 2009.
[10] J. Black. Compare-by-hash: A reasoned analysis. In Proc. of USENIX ATC, 2006.
[11] D. R. Bobbarjung, S. Jagannathan, and C. Dubnicki. Improving duplicate
elimination in storage systems. ACM Transactions on Storage, 2(4):424–448, 2006.
[12] A. Z. Broder. On the resemblance and containment of documents. In Proc. of
SEQUENCES, 1997.
[13] D. Dobre, P. Viotti, and M. Vukoli`c. Hybris: Robust hybrid cloud storage. In Proc.
of ACM SoCC, 2014.
[14] J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, and M. Theimer. Reclaiming
space from duplicate files in a serverless distributed file system. In Proc. of IEEE
ICDCS, 2002.
60