Professional Documents
Culture Documents
Data Leakage Detection 123
Data Leakage Detection 123
Data Leakage Detection 123
Content
Abstract
Introduction
Existing System
Proposed System
Problem Definition
Algorithm
Hardware & software
Modules
System Architecture
Disadvantages & Advantages
Applications
Future Modifications
Abstract
We study the following problem: A data distributor has given sensitive
data to a set of supposedly trusted agents (third parties). Some of the data are
leaked and found in an unauthorized place (e.g., on the web or somebody’s
laptop). The distributor must assess the likelihood that the leaked data came
from one or more agents, as opposed to having been independently gathered
by other means.
We propose data allocation strategies (across the agents) that improve
the probability of identifying leakages. These methods do not rely on
alterations of the released data (e.g., watermarks). In some cases, we can also
inject “realistic but fake” data records to further improve our chances of
detecting leakage and identifying the guilty party.
Introduction:
Leakage is a budding security threat to organizations, particularly
when data leakage is carried out by trusted agents. In this paper, we
present unobtrusive techniques for detecting data leakage and
assessing the “guilt” of agents. Water marking is the long-established
technique used for data leakage detection which involves some
modification to the original data.
To overcome the disadvantages of using watermark, data allocation
strategies are used to improve the feasibility of detecting guilty agent.
Distributor ”intelligently” allocates data based on sample request and
explicit request using allocation strategies in order to better the
effectiveness in detecting guilty agent.
Fake objects are designed to look like real objects, and are distributed
to agents together with requested data. Fake objects encrypted with a
private key are designed to look like real objects, and are distributed to
agents together with requested data. By this way we can identify, the
guilty agent who leaked the data by decrypting his fake object.
Aim of the Project
The aim of the project is to overcome data allocation problem and to send
secured data for third party agent. Our goal is detect when the distributor’s sensitive
data has been leaked by agents, and if possible to identify the agent that leaked the
data. We develop unobtrusive techniques for detecting leakage of a set of objects or
records.
Watermarks can be very useful in some cases, but again, involve some
modification of the original data. Furthermore, watermarks can sometimes be
destroyed if the data recipient is malicious.
E.g. A hospital may give patient records to researchers who will devise new
treatments. Similarly, a company may have partnerships with other companies
that require sharing customer data.
Another enterprise may outsource its data processing, so data must be given
to various other companies. We call the owner of the data the distributor and
the supposedly trusted third parties the agents.
Disadvantage Existing System
Cannot detect leaked data
Cannot detect source of Leaked data
Not secure
Can lead to huge losses
PROPOSED SYSTEM:
Our goal is to detect when the distributor’s sensitive data has been leaked by agents,
and if possible to identify the agent that leaked the data. Perturbation is a very useful
technique where the data is modified and made “less sensitive” before being handed to
agents. we develop unobtrusive techniques for detecting leakage of a set of objects or
records.
In this section we develop a model for assessing the “guilt” of agents. We also present
algorithms for distributing objects to agents, in a way that improves our chances of
identifying a leaker.
Finally, we also consider the option of adding “fake” objects to the distributed set.
Such objects do not correspond to real entities but appear realistic to the agents.
In a sense, the fake objects acts as a type of watermark for the entire set, without
modifying any individual members. If it turns out an agent was given one or more fake
objects that were leaked, then the distributor can be more confident that agent was
guilty.
Advantage Proposed System:-
We can provide security to our data during its distribution or
transmission and even we can detect if that gets leaked
we have presented implement a variety of data distribution strategies
that can improve the distributor’s chances of identifying a leaker
Quick response time
Customized processing
Small memory factor
Highly secure
Replication in Heterogenic Database
Easy updating.
The Distributor’s Objective (1/2)
e s t S (leaked)
e qu U1
R
R1
R1
ue st
R2 Req R3
U2
Patient Req
ues
Request R3 t
U3
Re
q
R4 Pr{G1|S}>>Pr{G2|S}
ue
st
U4 Pr{G1|S}>> Pr{G4|S}
Distribution Strategies
Sample Data Requests Explicit Data Requests
The distributor must provide
• The distributor has the freedom
agents with the data they
to select the data items to
request
provide the agents with
General Idea:
• General Idea:
Add fake data to the
– Provide agents with as much
distributed ones to minimize
disjoint sets of data as
overlap of distributed data
possible
Problem: Agents can collude and
• Problem: There are cases where
identify fake data
the distributed data must
NOT COVERED in this talk
overlap E.g., |Ri|+…+|Rn|>|T|
Fake Objects
add fake objects to the distributed data in order to
improve his effectiveness In detecting guilty agents.
Operating System :
Windows95/98/2000/XP/7
Application Server :
Tomcat5.0/6.X
Front End : HTML, Java, Jsp
Scripts : JavaScript.
Database : MySQL 5.0
Database Connectivity : JDBC.
MODULES:
1. Data Allocation Module:
The main focus of our project is the data allocation problem as how can
the distributor “intelligently” give data to agents in order to improve the chances
of detecting a guilty agent , Admin can send the files to the authenticated user,
users can edit their account details etc. Agent views the secret key details through
mail. In order to increase the chances of detecting agents that leak data.
4. Data Distributor:
A data distributor has given sensitive data to a set of supposedly
trusted agents (third parties). Some of the data is leaked and found in an
unauthorized place (e.g., on the web or somebody’s laptop). The distributor
must assess the likelihood that the leaked data came from one or more agents,
as opposed to having been independently gathered by other means . Admin can
able to view the which file is leaking and fake user’s details also.
Flowchart: Start
User Login
Admin Client
Authentication
Stop
0-Level DFD
Distributor
Data
User/ Leakage Admin
Client
Detection
Database
1-Level DFD
Third Party
Client Request data Distributor
(Implicit or Explicit) (send Data)
Distribution
Category and
Fake Objects
Gather Leak
Find out Data leakage Data
guilt agent detection algorithm
Advantages :
Advantages Of Proposed System:
Quick response time
Customized processing
Small memory factor
Highly secure
Replication in Heterogenic Database
Easy updating
Applications
Secure file transfers
Image Transmissions security
Video transfer security
Conclusions
Thank You