Data Leakage Detection 123

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

Data Leakage Detection

Content
Abstract
Introduction
Existing System
Proposed System
Problem Definition
Algorithm
Hardware & software
Modules
System Architecture
Disadvantages & Advantages
Applications
Future Modifications
Abstract
We study the following problem: A data distributor has given sensitive
data to a set of supposedly trusted agents (third parties). Some of the data are
leaked and found in an unauthorized place (e.g., on the web or somebody’s
laptop). The distributor must assess the likelihood that the leaked data came
from one or more agents, as opposed to having been independently gathered
by other means.
We propose data allocation strategies (across the agents) that improve
the probability of identifying leakages. These methods do not rely on
alterations of the released data (e.g., watermarks). In some cases, we can also
inject “realistic but fake” data records to further improve our chances of
detecting leakage and identifying the guilty party.
Introduction:
 Leakage is a budding security threat to organizations, particularly
when data leakage is carried out by trusted agents. In this paper, we
present unobtrusive techniques for detecting data leakage and
assessing the “guilt” of agents. Water marking is the long-established
technique used for data leakage detection which involves some
modification to the original data.
 To overcome the disadvantages of using watermark, data allocation
strategies are used to improve the feasibility of detecting guilty agent.
Distributor ”intelligently” allocates data based on sample request and
explicit request using allocation strategies in order to better the
effectiveness in detecting guilty agent.
 Fake objects are designed to look like real objects, and are distributed
to agents together with requested data. Fake objects encrypted with a
private key are designed to look like real objects, and are distributed to
agents together with requested data. By this way we can identify, the
guilty agent who leaked the data by decrypting his fake object.
Aim of the Project

The aim of the project is to overcome data allocation problem and to send
secured data for third party agent. Our goal is detect when the distributor’s sensitive
data has been leaked by agents, and if possible to identify the agent that leaked the
data. We develop unobtrusive techniques for detecting leakage of a set of objects or
records.

Objectives of the project:


• To secure our data from a third party.
• To detect when the distributor’s sensitive data has been leaked by agents, and if
possible to identify the agent that leaked the data.
Scope of the Project:-
Perturbation is a very useful technique where the data is modified and made “less
sensitive” before being handed to agents. We develop unobtrusive techniques for
detecting leakage of a set of objects or records.
We develop a model for assessing the “guilt” of agents. We also present
algorithms for distributing objects to agents, in a way that improves our chances of
identifying a leaker. Finally, we also consider the option of adding “fake” objects to
the distributed set. Such objects do not correspond to real entities but appear
realistic to the agents. In a sense, the fake objects acts as a type of watermark for the
entire set, without modifying any individual members. If it turns out an agent was
given one or more fake objects that were leaked, then the distributor can be more
confident that agent was guilty.
Problem Definition :
 Our goal is to detect when the distributor’s sensitive data has been leaked
by agents and if possible to identify the agent that leaked the data.
 Perturbation is a very useful technique where the data is modified and
made “less sensitive” before being handed to agents.
 We develop unobtrusive techniques for detecting leakage of a set of
objects or records.
 In this section we develop a model for assessing the “guilt” of agents. We
also present algorithms for distributing objects to agents, in a way that
improves our chances of identifying a leaker.
 Finally, we also consider the option of adding “fake” objects to the
distributed set.
 Such objects do not correspond to real entities but appear realistic to the
agents.
 In a sense, the fake objects acts as a type of watermark for the entire set,
without modifying any individual members.
 If it turns out an agent was given one or more fake objects that were
leaked, then the distributor can be more confident that agent was guilty.
EXISTING SYSTEM:
 Traditionally, leakage detection is handled by watermarking, e.g., a unique
code is embedded in each distributed copy. If that copy is later discovered in
the hands of an unauthorized party, the leaker can be identified.

 Watermarks can be very useful in some cases, but again, involve some
modification of the original data. Furthermore, watermarks can sometimes be
destroyed if the data recipient is malicious.

 E.g. A hospital may give patient records to researchers who will devise new
treatments. Similarly, a company may have partnerships with other companies
that require sharing customer data.

 Another enterprise may outsource its data processing, so data must be given
to various other companies. We call the owner of the data the distributor and
the supposedly trusted third parties the agents.
Disadvantage Existing System
 Cannot detect leaked data
 Cannot detect source of Leaked data
 Not secure
 Can lead to huge losses
PROPOSED SYSTEM:
 Our goal is to detect when the distributor’s sensitive data has been leaked by agents,
and if possible to identify the agent that leaked the data. Perturbation is a very useful
technique where the data is modified and made “less sensitive” before being handed to
agents. we develop unobtrusive techniques for detecting leakage of a set of objects or
records.
 In this section we develop a model for assessing the “guilt” of agents. We also present
algorithms for distributing objects to agents, in a way that improves our chances of
identifying a leaker.
 Finally, we also consider the option of adding “fake” objects to the distributed set.
Such objects do not correspond to real entities but appear realistic to the agents.
 In a sense, the fake objects acts as a type of watermark for the entire set, without
modifying any individual members. If it turns out an agent was given one or more fake
objects that were leaked, then the distributor can be more confident that agent was
guilty.
Advantage Proposed System:-
 We can provide security to our data during its distribution or
transmission and even we can detect if that gets leaked
 we have presented implement a variety of data distribution strategies
that can improve the distributor’s chances of identifying a leaker
 Quick response time
 Customized processing
 Small memory factor
 Highly secure
 Replication in Heterogenic Database
 Easy updating.
The Distributor’s Objective (1/2)
e s t S (leaked)
e qu U1
R
R1
R1
ue st
R2 Req R3
U2
Patient Req
ues
Request R3 t

U3
Re
q

R4 Pr{G1|S}>>Pr{G2|S}
ue
st

U4 Pr{G1|S}>> Pr{G4|S}
Distribution Strategies
Sample Data Requests Explicit Data Requests
 The distributor must provide
• The distributor has the freedom
agents with the data they
to select the data items to
request
provide the agents with
 General Idea:
• General Idea:
 Add fake data to the
– Provide agents with as much
distributed ones to minimize
disjoint sets of data as
overlap of distributed data
possible
 Problem: Agents can collude and
• Problem: There are cases where
identify fake data
the distributed data must
 NOT COVERED in this talk
overlap E.g., |Ri|+…+|Rn|>|T|
Fake Objects
add fake objects to the distributed data in order to
improve his effectiveness In detecting guilty agents.

Each Agent get unique fake object

It help identify the guilty agents


Algorithm:
 The AES algorithm is based on permutations and substitutions. Permutations are
rearrangements of data, and substitutions replace one unit of data with another. AES performs
permutations and substitutions using several different techniques. The four operations Sub
Bytes, Shift Rows, Mix Columns, and Add Round Key are called inside a loop that executes
Nr times—the number of rounds for a given key size, less 1. The number of rounds that the
encryption algorithm uses is 10, 12, or 14 and depends on whether the seed key size is 128,
192, or 256 bits.
 Allocation for Explicit Data Requests: In this request the agent will send the request with
appropriate condition. Agent gives the input as request with input as well as the
condition for the request after processing the data after processing on the data the gives
the data to agent by adding fake object with an encrypted format.
 Allocation for Sample Data Requests: In this request agent request does not have
condition. The agent sends the request without condition as agent sends the request
without condition as per his query he will get the data. the distributor can assess the
likelihood that the leaked data came from one or more agents, as opposed to having
been independently gathered by other means. We develop a model for assessing the
“guilt” of agents.
HARDWARE & SOFTWARE REQUIREMENT
H/W System Configuration:-
Processor - Pentium –III
Speed - 1.1Ghz
RAM - 256 MB(min)
Hard Disk - 20 GB

S/W System Configuration:-

Operating System :
Windows95/98/2000/XP/7
Application Server :
Tomcat5.0/6.X
Front End : HTML, Java, Jsp
Scripts : JavaScript.
Database : MySQL 5.0
Database Connectivity : JDBC.
MODULES:
1. Data Allocation Module:
The main focus of our project is the data allocation problem as how can
the distributor “intelligently” give data to agents in order to improve the chances
of detecting a guilty agent , Admin can send the files to the authenticated user,
users can edit their account details etc. Agent views the secret key details through
mail. In order to increase the chances of detecting agents that leak data.

2. Fake Object Module:


The distributor creates and adds fake objects to the data that he
distributes to agents. Fake objects are objects generated by the distributor in order
to increase the chances of detecting agents that leak data. The distributor may be
able to add fake objects to the distributed data in order to improve his
effectiveness in detecting guilty agents. Our use of fake objects is inspired by the
use of “trace” records in mailing lists. In case we give the wrong secret key to
download the file, the duplicate file is opened, and that fake details also send the
mail. Ex: The fake object details will display.
Cont.
3. Optimization Module:
The Optimization Module is the distributor’s data allocation to agents
has one constraint and one objective. The agent’s constraint is to satisfy
distributor’s requests, by providing them with the number of objects they
request or with all available objects that satisfy their conditions. His objective is
to be able to detect an agent who leaks any portion of his data. User can able to
lock and unlock the files for secure.

4. Data Distributor:
A data distributor has given sensitive data to a set of supposedly
trusted agents (third parties). Some of the data is leaked and found in an
unauthorized place (e.g., on the web or somebody’s laptop). The distributor
must assess the likelihood that the leaked data came from one or more agents,
as opposed to having been independently gathered by other means . Admin can
able to view the which file is leaking and fake user’s details also.
Flowchart: Start

User Login

Admin Client

Authentication

1. Add data Request data


2. Update Data 1. Explicit data
3. Search Data 2. Implicit data

Got the Leak Data


Receive the data from
admin

Detection using unique


fake objects and
minimum sent data Data Leak

Find the guilt agents on


the bases of probability

Stop
0-Level DFD
Distributor

Data
User/ Leakage Admin
Client
Detection

Database
1-Level DFD
Third Party
Client Request data Distributor
(Implicit or Explicit) (send Data)

Distribution
Category and
Fake Objects

Gather Leak
Find out Data leakage Data
guilt agent detection algorithm
Advantages :
Advantages Of Proposed System:
Quick response time
Customized processing
Small memory factor
Highly secure
Replication in Heterogenic Database
Easy updating
Applications
 Secure file transfers
 Image Transmissions security
 Video transfer security
Conclusions
Thank You

You might also like