Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

Hiding Frequent Patterns in

The Updated Database

A. SABARI VASAN
R. VIGNESH KUMAR
ABSTRACT
 Sensitive frequent pattern hiding is an important issue in praivacy preserving
data mining.

 In this era of information explosion and rapid development of the Internet,


the data stored in the database is usually continuously updated.

 Existing frequent pattern hiding algorithms gradually become inadequate


because those algorithms are originally designed for static database.

 In order to solve this problem, we propose an incremental mechanism and


design a data structure in this paper to hide sensitive frequent patterns in the
incremental environment.

 In this mechanism, the transaction data and sensitive patterns are stored in
two types of trees. The proposed algorithm can efficiently find related
transactions by links between these two types of trees.
EXISTING SYSTEM
 Frequent pattern mining has been an important technology in the data mining
and knowledge discovery for along time.

 Recently, with the rise of awareness of the privacy protection,


researchers also notice that the privacy issue in data mining is
important.

 Most database owners are unwilling to share their information with


others since some sensitive information could be disclosed by other
people.

 Existing frequent pattern hiding algorithms gradually become


inadequate because those algorithms are originally designed for static
database .
PROPOSED SYSTEM
 In order to minimize side effects, we design a data structure, Sensitive Pattern
Indexed Transaction Forest (SPITF), to store all transactions.

 SPITF allows us to modify transactions in the whole dataset without


accessing the database.

 Because the sensitive indexer in SPITF provides an efficient way to retrieve


sensitive transactions of a sensitive pattern we will not lose any chance to
choose the optimal transactions.

 Sensitive frequent pattern hiding is an important issue in privacy preserving


data mining.
MODULES
Generation of Template Table

Maximum SPC Generation


Sanitization Framework
Hiding FP in Transcation
MODULE DESCRIPTION

Generation of Template Table


In this framework, we apply a template based algorithm.

The item in each sensitive pattern has the potential to become a Victim to hide the sensitive pattern. The
UCP is the union of all sensitive patterns which can be sanitization by this template.

The SPC is the number of sensitive patterns covered by this template. we need to search out all
transactions containing UCP, and remove sufficient number of the corresponding Victim.

The MC (Max Count) is decided by the maximum among the counts of sensitive patterns covered by a
template.
Maximum SPC Generation


First, we select the template with maximum SPC and can hide more sensitive patterns by
using one template.

If more than one template has the same SPC, select the template with the smallest MC.

If there are still templates with the same SPC and MC, choose the template whose victim has
the highest support.

This strategy alleviates the side effect of hiding the non-sensitive information simultaneously.

Finally, if there is still a tie, we will choose a template randomly.
Sanitization Framework


In this section the whole framework of our sanitization process is presented.

Sensitive Pattern Indexed Transaction Forest (SPITF), to store all transactions. SPITF allows us to
modify transactions in the whole dataset without accessing the database.

SPITF has two major components,sensitive indexer and info holder, which are used to store the
sensitive part and the non-sensitive part of each transaction respectively.

First, the support of each item in the dataset and in the sensitive pattern table is calculated, and
items are sorted in descending order of their supports.
Hiding FP in Transation


When a template is selected, the sanitization algorithm needs to retrieve MC-transactions which
contain the UCP of the selected template from the SPITF.

. First, the items in the given UCP should be sorted in descending order of their support.

Second, according to the sequence of items, we traverse the sensitive indexer similar to the
insertion step to locate the node which stores the record of the UCP.

When the specified transactions have been identified, the next step is to remove the victim on these
transactions.
SYSTEM ARCHITECTURE
DATA FLOW DIAGRAM

FLOW – 1:

Database with
Mining
User Non-sensitive
Process
FLOW – 2:
User

Admin

Updated
database
Frequent
Items

Template
table

Fp
Maximum
Count
User

Admin

FLOW – 3: Updated
database
Frequent
Items

Template
table

Fp
Maximum
Count

Hidding
Process

Database with
Non-sensitive
Data
Screen Shots
 frmMyHome.aspx: Add Transaction – Incremental/Updated Database
frmUDBTransaction.aspx - The Updated Database
frmFP.aspx – Frequent Item List and Template Table.
frmHidingFP.aspx – Hiding the Frequent pattern found in the template table
and displaying remaining itemset.
SYSTEM CONFIGURATION
HARDWARE REQUIREMENT

Processor : Intel Pentium IV


Clock speed : 1.8 GHz
RAM : 256 MB
HDD : 80 GB

SOFTWARE REQUIREMENT

Core Language : C#.Net


Database : Sql Server 2000
IDE : Micrsoft Visual Studio 2008
Operating System : Windows XP
CONCLUSION
 We designed a mechanism which can hide sensitive frequent patterns in the
incremental environment.

 We applied the template-based concept to control the supports of sensitive


patterns when the database is updated.

 The experiment results verified that SPITF can hide the sensitive information
in fewer database modifications and in fewer side effects than two
comparative .

 Also, the efficiency of SPITF is good to deal with the incremental database.
REFERENCE
 R. Agrawal, T. Imielinski, and A.N. Swami. Mining association rules between sets of
items in large databases. In P. Buneman and S. Jajodia, editors, Proceedings of the
1993 ACM SIGMOD International Conference on Management of Data, volume 22(2)
of SIGMOD Record, pages 207–216. ACM Press, 1993.

 J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without candidate
generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery,
2003. To appear.

 C. Clifton and D. Marks, “Security and Privacy Implications of Data Mining,” in ACM
SIGMOD Workshop on Data Mining and Knowledge Discovery, (Montreal, Canada),
pp. 15–19, University of British Columbia Department of Computer Science, 1996.

 Saygin, Y., Verykios, V., & Clifton, C. (2001). Using unknowns to prevent discovery of
association rules. SIGMOND Record, 30(4), 45– 54.
THANK YOU

You might also like