Professional Documents
Culture Documents
Data Leakage Detection
Data Leakage Detection
Under the guidance of Mr. P. Vijay Varma Assistant professor Computer Science department
ABSTRACT
1. A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). 2. Some of the data are leaked and found in an unauthorized place (e.g., on the web or somebodys laptop). 3. We propose data allocation strategies (across the agents) that improve the probability of identifying leakages. 4. These methods do not rely on alterations of the released data (e.g., watermarks). 5. In some cases, we can also inject realistic but fake data records to further improve our chances of detecting leakage and identifying the guilty party.
AGENDA
1.Introduction 2.Proposed System 3.Pertubation 4.Existing System 5.Allocation Strategies 6.Problem Setup and Notation 7.Data Allocation Problem 8.Fake Objects
Contd..
INTRODUCTION:
1. In the course of doing business, sometimes sensitive data must be handed over to supposedly trusted third parties. For example, a hospital may give patient records to researchers who will devise new treatments. We call the owner of the data the distributor and the supposedly trusted third parties the agents. 2. Our goal is to detect when the distributors sensitive data has been leaked by agents, and if possible to identify the agent that leaked the data.
PROPOSED SYSTEM:
1.Using the technique of Perturbation data is made less sensitive for the agents to handle 2.In this section we develop a model for assessing the guilt of agents. 3.We also present algorithms for distributing objects to agents, in a way that improves our chances of identifying a leaker. 4.Our goal is to detect when the distributors sensitive data has been leaked by agents, and if possible to identify the agent that leaked the data
PERTURBATION:
1.Perturbation is a very useful technique where the data is modified and made less Sensitive before handing over to the supposedly trusted third party. 2. one can add Random noises to a set of attributes or exchange the exact values with the ranges. 3.In some cases its not always important to alter the distributors data.
PERTURBATION GRAPH:
EXISTING SYSTEM:
1.Traditionally, leakage detection is handled by watermarking, e.g., a unique code is embedded in each distributed copy. 2.If that copy is later discovered in the hands of an unauthorized party, the leaker can be identified. 3.Watermarks can be very useful in some cases, but again, involve some modification of the original data
RELATED WORK
The main idea is to generate a watermark W(x; y) using a secret key chosen by the sender such that W(x; y) is indistinguishable from random noise for any entity that does not know the key (i.e., the recipients). The sender adds the watermark W(x; y) to the information object (image) I(x; y) before sharing it with the recipient(s). It is then hard for any recipient to guess the watermark W(x; y) (and subtract it from the transformed image I0(x; y)); the sender on the other hand can easily extract and verify a watermark (because it knows the key).
ALLOCATION STRATEGIES
3: if bi > 0 then
4: R R {i} 5: Fi 6: while B > 0 do 7: i SELECTAGENT(R,R1, . . . , Rn)
13: R R\{Ri}
14: B B 1
CONTD.....
2.Sample Data Requests
With sample data requests agents are not interested in particular objects.
1: a 0|T| a[k]:number of agents who have received object tk 2: R1 , . . . , Rn 3: remaining 4: while remaining > 0 do
RELATED WORK
1. The guilt detection approach we present is related to the data provenance problem. 2. As far as the data allocation strategies are concerned ,our work is mostly relevant to watermarking that issued as a means of establishing original ownership of distributed objects.
1. The main focus of this paper is the data allocation problem : how can the distributor intelligently give data to agents in order to improve the chances of detecting a guilty agent? 2. The two types of requests we handle were defined are sample and explicit. Fake objects are objects generated by the distributor that are not in set T. The objects are designed to look like real objects, and are distributed to agents together with T objects, in order to increase the chances of detecting agents that leak data.
FAKE OBJECTS
1. The distributor may be able to add fake objects to the distributed data in order to improve his effectiveness in detecting guilty agents. However, fake objects may impact the correctness of what agents do, so they may not always be allowable. The idea of perturbing data to detect leakage is not new, e.g.,
Contd....
2. However, in most cases, individual objects are perturbed, e.g., by adding random noise to sensitive salaries, or adding a watermark to an image. In our case, we are perturbing the set of distributor objects by adding fake elements. In some applications, fake objects may cause fewer problems that perturbing real objects. 3.The distributor creates and adds fake objects to the data that he distributes to agents. We let Fi _ Ri be the subset of fake objects that agent Ui receives.
Fig2:
Contd..
The distributor can also use function CREATEFAKEOBJECT() when it wants to send the same fake object to a set of agents. In this case, the function arguments are the union of the Ri and Fi tables, respectively, and the intersection of the conditions condis. Although we do not deal with the implementation of CREATEFAKEOBJECT().
LIMITATIONS
Traditionally, leakage detection is handled by watermarking, e.g., a unique code is embedded in each distributed copy. If that copy is later discovered in the hands of an unauthorized party, the leaker can be identified. Watermarks can be very useful in some cases, but again, involve some modification of the original data. Furthermore, watermarks can sometimes be destroyed if the data recipient is malicious.
IMPACT OF PROBABILITY P
In our first scenario, T contains 16 objects: all of them are given to agent U1 and only eight are given to a second agent U2. We calculate the probabilities Pr{G1|S} and Pr{G2|S} for p in the range [0, 1] and we present the results in Fig. 1a. The dashed line shows Pr{G1|S} and the solid line shows Pr{G2|S}.
In this section, we again study two agents, one receiving all the T = S data and the second one receiving a varying fraction of the data. Fig. 1b shows the probability of guilt for both agents, as a function of the fraction of the objects owned by U2, i.e., as a function . In this case, p has a low value of 0.2, and U1 continues to have all 16S objects. Note that in our previous scenario, U2 has 50 percent of the S objects.
OPTIMIZATION :
The Optimization Module is the distributors data allocation to agents has one constraint and one objective. The distributors constraint is to satisfy agents requests, by providing them with the number of objects they request or with all available objects that satisfy their conditions. His objective is to be able to detect an agent who leaks any portion of his data.
CONCLUSION:
1. In a perfect world, there would be no need to hand over sensitive data to agents that may unknowingly or maliciously leak it. And even if we had to hand over sensitive data, in a perfect world, we could watermark each object so that we could trace its origins with absolute certainty. 2. However, in many cases, we must indeed work with agents that may not be 100 percent trusted, and we may not be certain if a leaked object came from an agent or from some other source, since certain data cannot admit watermarks. In spite of these difficulties, we have shown that it is possible to assess the likelihood that an agent is responsible for a leak, based on the overlap of his data with the leaked data and the data of other agents, and based on the probability that objects can be guessed by other means. Our model is relatively simple, but we believe that it captures the essential trade-offs.
REFERENCES
[1] R. Agrawal and J. Kiernan, Watermarking Relational Databases, Proc. 28th Intl Conf. Very Large Data Bases (VLDB 02), VLDB
2003.
CONTD...
[6] S. Czerwinski, R. Fromm, and T. Hodes, Digital Music Distributionand Audio Watermarking, http://www.scientificcommons.org/43025658, 2007.
[7] F. Guo, J. Wang, Z. Zhang, X. Ye, and D. Li, An Improved Algorithm to Watermark Numeric Relational Data, InformationSecurity Applications, pp. 138-149, Springer, 2006.
[8] F. Hartung and B. Girod, Watermarking of Uncompressed andCompressed Video, Signal Processing, vol. 66, no. 3, pp. 283-301,1998.
THANK YOU