Professional Documents
Culture Documents
Privacy Preserving Heuristic Approach For Association Rule Mining in Distributed Database
Privacy Preserving Heuristic Approach For Association Rule Mining in Distributed Database
Privacy Preserving Heuristic Approach For Association Rule Mining in Distributed Database
Abstract:
Association rule mmmg is a powerful model of data is some technique to deal with security .Privacy
mining used for finding hidden patterns in large preserving in data mining is one of the technique that
databases. The challenges of data mining is to secure deal with security of the knowledge that extracted by
the confidentiality of sensitive patterns when releasing data mining technique. There are various Data
database of third parties. Privacy Preserving in this
Mining Tasks:
paper is used as hide association rule. Association rule
hiding algorithm sanitize database such that certain
sensitive association rule cannot be discovered through
Association rule mining techniques. There are various
Classification
approach this describe in this paper but used the
Heuristic approach in Data Distortion Technique. The
Clustering
proposed algorithm is the extension of MDSRRC
algorithm, which hides multiple R.H.S items. In
Association Rule Mining
Proposed work MDSRRC algorithm works on the
distributed database. We will show experimental results
Sequential Pattern Mining
in comparisons with MDSRRC algorithm in single
database and MDSRRC algorithm in distributed
Regression
database.
accurate models over aggregate data, while protecting Confidence of rule A-B is the transaction database
privacy at the level of individual records. that contain A also contain B. The confidence for rule
(A-B) can be calculated using below formula in (2).
The main purpose of privacy preserving data
IAUBI
mmmg is to design competent frameworks and confidence(A-B) = -- (2)
A
algorithms that can extract relevant knowledge from
a large amount of data without revealing of any IV. ASSOCIATION RULE HIDING
sensItIve information[9]. It protects sensitive
information by providing sanitized database of Association rule hiding is one technique to
original database on the internet or a process is used PPDM(Privacy Preserving Data Mining). Association
in such a way that private data and private knowledge rule hiding methodology aim is to sanitized the
remain private even after the mining process. It is original data. so it may be applied to following
PPDM due to which the benefits of data mining be condition:
enjoyed, without compromising the privacy of
(1) sanitized database is not reveal any sensitive
concerned individuals.
rules.
III. ASSOCIATION RULE MINING
(2) sanitized database is mining of all non-sensitive
rules.
Association rule mining one of the task of
the data mining. Association rule mining is important
(3) sanitized database is not add any new rules, not
field to under privacy preserving data mining. R.
present in database D.
Agrawal was first proposed the basic concept of the
Association rule mining. Association rule is basically Association rule hiding is the depend on
using the concept of IF-THEN relationship among support and confidence of the rule, There is two way
the different data. following example of shows the to hide any rule (i) Decrease support up to certain
concept of the Association rule. "If the customer buy threshold. (ii) Decrease confidence up to certain
a laptop ,then he/she is 85% likely to also purchase threshold.
anti-virus ". Analysis of the above example that
laptop is somewhat related to anti-virus because
every time customer buy a computer then he/she buy
anti-virus. Association rule is used for market basket OrigIDal
analysis. Let 1={II,I2,...,In} be a set of item. Let D be database
a database of transactions where each transaction T is
a set of item such that T !:;; I. For every transaction is
associated to an identifier, called TID. A transaction Frequent Associatio,n
T is contain A if and only if A!:;; T. An association item mining RDlehiding
rule is applied of the form A-B. where AQ, BQ. algorithms
and AnB = <1>. every association rule must be satisfy
two contain support and confidence. Generation of Associ
Association ation
Support of rule A-B is the transaction database that Association rules
ruies rules
contain support count of AUB. support for rule
(A-B) can be calculated using below formula in (1).
Lost rule: there are non sensItIve rule in original there are D's and I's must be hidden during blocking,
database it can be mined by mining algorithms but it because D's or 1 's replace with " ? ". In some applications
where publishing wrong data is not acceptable, then
can not mining after using hiding algorithm from
unknown values may be inserted to blur the rules. so, that
modify database
support of certain items goes down to certain level and rule
mining algorithm nit able to mine the sensitive rules[3].
False rule: The sensitive rule can not hidden by
hidden algorithm and can be mined by using mining Advantages: It Maintain database , instead of inserting
algorithm for modify database. false value to block the original value.
Ghost rule: this rule are not present In original Disadvantages: Difficult to reproduce the original
database but generate using the hiding algorithm. database. and it is the various side effect like lost rule,
ghost rule, false rule etc.
Association rule hiding technique can be classified
into heuristic based approaches, reconstruction based (2) Border Based Approaches:
approaches, border based approaches, exact
approaches, reconstruction based approaches and Border based approach is hide the sensitive
association rules by modifying the border in the
cryptography based approaches.
lattice of the frequent and the infrequent item sets in
(1) Heuristic based approaches: original database[3]. this approach is make the border
between the frequent and infrequent items. that way
Heuristic based approach aIm for taking this border is divided the frequent and in frequent
locally best decisions with respect to hiding the item sets. The first frequent item set hiding
sensitive knowledge which, however, which are not methodology that is based on the notion of the
necessary globally best. In heuristic result, heuristic border[l]. It maintains the quality of database by
fail to provides guarantees to identified hiding greedily selecting the modifications with minimal
solution. This approach is fast, efficient and scalable side effect.
algorithm to select the sanitize data set in which set
of transaction from database to hide the association Advantages: It maintain the database quality by selecting
the modification with the minimal side effect.
rule.[l]
Disadvantages: the approach is require the high There are following paper refer as the
complexity due to the binary integer programming. literature survey. This all paper included the
advantages and limitation.
(4) Reconstruction Based Approaches:
Dharmendra Thakur, Prof. Hitesh Gupta
Reconstruction based approaches is generate
[9], they present various association rule hiding
privacy for database by using sensitive characteristic
algorithm. And they work on the heuristic based
from the original database. it produce lesser side
algorithm because this approach is efficient, fast
effect in database. In [5] it define the FP tree based
algorithm to hide the sensitive knowledge. This paper
algorithm which reconstruct the original database and
are using the support based algorithm and confidence
efficient generate number of secure database.
based algorithm.
Advantages: Reconstruction based approaches is create the
Pallavi Dubey[ll], she is solve the huge
privacy of database and lesser side effect than heuristic
based approaches.
storage and its management. she is processing to
increase throughput of output data over distributed
Disadvantages: The problem is number of transaction is database. she conclude that to improve response time,
restricted in new database. number of processor in distributed environment.
Vikram Garg, Anju Singh, Divakar Step: 2 - Algorithm Finds IS = {isO, isl, ... isn}
Singh[12], Data mining is extract the hidden k<=n, By arranging those in decreasing order of their
predictive information from the data warehouse counts.
without revealing their sensitive information. they
show all PPDM(privacy-preserving data mining) Step: 3- Calculate Sensitivity of each Transaction.
technique and also show the recently research area to
Step: 4 - The Transaction which support isO are
deal with the association rule hiding. Show the
shorted in decreasing order of their sensitivity.
comparative analysis of the all association rule hiding
approaches. Step: 5 - In Shorted Transaction which is having
higher Sensitivity, delete item isO from the
J. Arokia Renjit, Dr .K.L
transaction.
.Shunmuganathan[16], Data mining algorithm are
used to centralized algorithm. in that centralize data Step: 6 - Update support and confidence when delete
mining algorithm to discover useful patterns in item of transaction. Also update the database.
distributed databases. in that the distributed database
is not always feasible because merging data sets from
different sites incurs huge network communication
costs. they are using the FDM algorithm fro This following figure show the proposed
geographically Distributed Data sets, and use flow of the work.
effective pruning technique to minimized the
I �
I I I
Database(D) Apply
nl
Combme me
message for the support. conclude that mining
association rules in distributed databases which is
res.ult Association
rule II
reduced communication , efficiency, and scalability
1 Distributed T 1
Shorting item
then communication of sequential algorithm in Divide Database
According to
-i
Distributed database. i.e. ( 01 ,02,03) sensitivity
Up ate
--11
1
Rul. Hiding
of all divided DB
1
hide the sensitive association rule with multiple items
in consequent (R.H.S) and antecedent (L.H.S). They � Update
I
overcome the problem of DSRRC algorithm. my Combine with the
Upd ,te
Transaction DB
proposed work is related to this paper. they also show remaining
aliOS using
MDSRRC algorithm is increase the efficiency of the Syn c hronization (using
message passing)
DSRRC algorithm.
Step: 3 - Apply the Association rule and Generate the VIII. FUTURE WORK
Association rule of all dividing Database.
Implementation of the proposed work is still under
Step: 4 - suppose select the Database Dl then progress and is left as future work.
combine the remaining two database D2,D3.
Step: 6 - Combine All the result that is the generated & Hide Multiple R.H.S. Items " International Journal of
transaction.
[5] Y. Guo, " Reconstruction-Based Association Rule
Step: 12 - Update support and confidence when Hiding," In Proc. of SIG MOD2007 Ph.D. Workshop on
Innovative Database Research 2007(ID A R2007), June
delete item of transaction. Also update the database.
2007.
VII. CONCLUSION
[6] Tamir Tassa; "Secure Mining of Association Rules in
Horizontally Distributed Databases" ieee transactions on
MDSRRC algorithm is hide the sensitive association
knowledge and data engineering, vol. 26, no. 4, april 20 1 4.
rule with few modification on database. Which
maintain the data quality and reduced the side effect [7] N V Muthu Lakshmi and Dr. K Sandhya Rani, " An
of database. I will try to extend the MDSRRC improved algorithm for hiding sensitive association rules
algorithm in distributed database. I will try to make using exact approach" I RACST - International Journal of
association rule more secure. And I will also try to Computer Science and Information Technology & Security
(IJCSITS), I S S N: 2249-9555 Vol. 2, No. 1 , 20 1 2.
the MDSRRC algorithm, increase the efficiency and
reduced the side effect by minimizing the [8] Due H. Tran, Wee Keong Ng and Wei Zha,
modification on database. and will used distributed "CRYPP AR: An Efficient Framework for Privacy
database to hide the association rule. Comparison of Preserving Association Rule Mining over Vertically
DSRRC,MDSRRC in single database, and my Partitioned Data" 978- 1 --4244--4547-9/09/$26.00 c 2009
[ 1 0] Nikunj H. Domadiya, Udai Pratap Rao, " Hiding [ 1 6] 1. Arokia Renjit, Dr.K.L.Shunmuganathan, " Mining
Sensitive Association Rules to Maintain Privacy and Data the data from distributed database using an improved
Quality in Database " 3rd IEEE InternationalAdvance mining algorithm ", International Journal of Computer
Computing Conference IEEE 20 1 3. Science and Information Security, Vol. 7, No. 3, March
20 1 0.
[II] Pallavi Dubey, " Association Rule Mining on
Distributed Data ", International Journal of Scientific &
Engineering Research, Volume 3, Issue I, January-20 1 2.