Hiding Frequent Patterns in The Updated Database: A. Sabari Vasan R. Vignesh Kumar

Hiding Frequent Patterns in
The Updated Database
A. SABARI VASAN
R. VIGNESH KUMAR
ABSTRACT
 Sensitive frequent pattern hiding is an important issue in praivacy preserving
data mining.
 In this era of information explosion and rapid development of the Internet,

the data stored in the database is usually continuously updated.
 Existing frequent pattern hiding algorithms gradually become inadequate

because those algorithms are originally designed for static database.
 In order to solve this problem, we propose an incremental mechanism and

design a data structure in this paper to hide sensitive frequent patterns in the
incremental environment.
 In this mechanism, the transaction data and sensitive patterns are stored in
two types of trees. The proposed algorithm can efficiently find related
transactions by links between these two types of trees.
EXISTING SYSTEM
 Frequent pattern mining has been an important technology in the data mining
and knowledge discovery for along time.
 Recently, with the rise of awareness of the privacy protection,

researchers also notice that the privacy issue in data mining is
important.
 Most database owners are unwilling to share their information with

others since some sensitive information could be disclosed by other
people.
 Existing frequent pattern hiding algorithms gradually become

inadequate because those algorithms are originally designed for static
database .
PROPOSED SYSTEM
 In order to minimize side effects, we design a data structure, Sensitive Pattern
Indexed Transaction Forest (SPITF), to store all transactions.
 SPITF allows us to modify transactions in the whole dataset without

accessing the database.
 Because the sensitive indexer in SPITF provides an efficient way to retrieve

sensitive transactions of a sensitive pattern we will not lose any chance to
choose the optimal transactions.
 Sensitive frequent pattern hiding is an important issue in privacy preserving

data mining.
MODULES
Generation of Template Table
Maximum SPC Generation

Sanitization Framework
Hiding FP in Transcation
MODULE DESCRIPTION
Generation of Template Table
●
In this framework, we apply a template based algorithm.
●
The item in each sensitive pattern has the potential to become a Victim to hide the sensitive pattern. The
UCP is the union of all sensitive patterns which can be sanitization by this template.
●
The SPC is the number of sensitive patterns covered by this template. we need to search out all
transactions containing UCP, and remove sufficient number of the corresponding Victim.
●
The MC (Max Count) is decided by the maximum among the counts of sensitive patterns covered by a
template.
Maximum SPC Generation
●
First, we select the template with maximum SPC and can hide more sensitive patterns by
using one template.
●
If more than one template has the same SPC, select the template with the smallest MC.
●
If there are still templates with the same SPC and MC, choose the template whose victim has
the highest support.
●
This strategy alleviates the side effect of hiding the non-sensitive information simultaneously.
●
Finally, if there is still a tie, we will choose a template randomly.
Sanitization Framework
●
In this section the whole framework of our sanitization process is presented.
●
Sensitive Pattern Indexed Transaction Forest (SPITF), to store all transactions. SPITF allows us to
modify transactions in the whole dataset without accessing the database.
●
SPITF has two major components,sensitive indexer and info holder, which are used to store the
sensitive part and the non-sensitive part of each transaction respectively.
●
First, the support of each item in the dataset and in the sensitive pattern table is calculated, and
items are sorted in descending order of their supports.
Hiding FP in Transation
●
When a template is selected, the sanitization algorithm needs to retrieve MC-transactions which
contain the UCP of the selected template from the SPITF.
●
. First, the items in the given UCP should be sorted in descending order of their support.
●
Second, according to the sequence of items, we traverse the sensitive indexer similar to the
insertion step to locate the node which stores the record of the UCP.
●
When the specified transactions have been identified, the next step is to remove the victim on these
transactions.
SYSTEM ARCHITECTURE
DATA FLOW DIAGRAM
FLOW – 1:
Database with
Mining
User Non-sensitive
Process
FLOW – 2:
User
Admin
Updated
database
Frequent
Items
Template
table
Fp
Maximum
Count
User
Admin
FLOW – 3: Updated
database
Frequent
Items
Template
table
Fp
Maximum
Count
Hidding
Process
Database with
Non-sensitive
Data
Screen Shots
frmMyHome.aspx: Add Transaction – Incremental/Updated Database
frmUDBTransaction.aspx - The Updated Database
frmFP.aspx – Frequent Item List and Template Table.
frmHidingFP.aspx – Hiding the Frequent pattern found in the template table
and displaying remaining itemset.
SYSTEM CONFIGURATION
HARDWARE REQUIREMENT
Processor : Intel Pentium IV

Clock speed : 1.8 GHz
RAM : 256 MB
HDD : 80 GB
SOFTWARE REQUIREMENT
Core Language : C#.Net

Database : Sql Server 2000
IDE : Micrsoft Visual Studio 2008
Operating System : Windows XP
CONCLUSION
 We designed a mechanism which can hide sensitive frequent patterns in the
incremental environment.
 We applied the template-based concept to control the supports of sensitive

patterns when the database is updated.
 The experiment results verified that SPITF can hide the sensitive information
in fewer database modifications and in fewer side effects than two
comparative .
 Also, the efficiency of SPITF is good to deal with the incremental database.
REFERENCE
 R. Agrawal, T. Imielinski, and A.N. Swami. Mining association rules between sets of
items in large databases. In P. Buneman and S. Jajodia, editors, Proceedings of the
1993 ACM SIGMOD International Conference on Management of Data, volume 22(2)
of SIGMOD Record, pages 207–216. ACM Press, 1993.
 J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without candidate
generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery,
2003. To appear.
 C. Clifton and D. Marks, “Security and Privacy Implications of Data Mining,” in ACM
SIGMOD Workshop on Data Mining and Knowledge Discovery, (Montreal, Canada),
pp. 15–19, University of British Columbia Department of Computer Science, 1996.
 Saygin, Y., Verykios, V., & Clifton, C. (2001). Using unknowns to prevent discovery of
association rules. SIGMOND Record, 30(4), 45– 54.
THANK YOU

Hiding Frequent Patterns in The Updated Database: A. Sabari Vasan R. Vignesh Kumar

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hiding Frequent Patterns in The Updated Database: A. Sabari Vasan R. Vignesh Kumar

Uploaded by

Copyright:

Available Formats

Hiding Frequent Patterns in

The Updated Database

 In this era of information explosion and rapid development of the Internet,

 Existing frequent pattern hiding algorithms gradually become inadequate

 In order to solve this problem, we propose an incremental mechanism and

 Recently, with the rise of awareness of the privacy protection,

 Most database owners are unwilling to share their information with

 Existing frequent pattern hiding algorithms gradually become

 SPITF allows us to modify transactions in the whole dataset without

 Because the sensitive indexer in SPITF provides an efficient way to retrieve

 Sensitive frequent pattern hiding is an important issue in privacy preserving

Maximum SPC Generation

Generation of Template Table

Processor : Intel Pentium IV

Core Language : C#.Net

 We applied the template-based concept to control the supports of sensitive

You might also like