Privacy Preserving Heuristic Approach For Association Rule Mining in Distributed Database

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

IEEE Sponsored 2nd International Conference on Innovations in Information Embedded and Communication Systems ICIIECS'15

Privacy preserving heuristic approach For


Association Rule Mining in Distributed
Database
Bhoomika R Mistry Amish Desai
Computer Science and Engineering Computer Science and Engineering
Parullnstitute of Technology, Vadodara Assi.Prof Parullnstitute of Technology, Vadodara
Gujarat,lNDIA-391760 Gujarat,lNDIA-391760
bhumikamistry92@gmail.com desaiamishI986@gmail.com

Abstract:
Association rule mmmg is a powerful model of data is some technique to deal with security .Privacy
mining used for finding hidden patterns in large preserving in data mining is one of the technique that
databases. The challenges of data mining is to secure deal with security of the knowledge that extracted by
the confidentiality of sensitive patterns when releasing data mining technique. There are various Data
database of third parties. Privacy Preserving in this
Mining Tasks:
paper is used as hide association rule. Association rule
hiding algorithm sanitize database such that certain
sensitive association rule cannot be discovered through
Association rule mining techniques. There are various
Classification
approach this describe in this paper but used the
Heuristic approach in Data Distortion Technique. The
Clustering
proposed algorithm is the extension of MDSRRC
algorithm, which hides multiple R.H.S items. In
Association Rule Mining
Proposed work MDSRRC algorithm works on the
distributed database. We will show experimental results
Sequential Pattern Mining
in comparisons with MDSRRC algorithm in single
database and MDSRRC algorithm in distributed
Regression
database.

II. PRIVACY PRESERVING DATA MINING


Keywords: Data Mining Privacy preserving, association

rule hiding, sensitivity


Data mining is the process of gathering
information about the user specific data, also called
I. INTRODUCTION
knowledge discovery, on internet. The problem with
Today there is large amount of data proceed data mining output is that it also discloses some
in every day from different sources. that large amount information, which is considered to be private and
of data stored in different database. this data store in personal. Effortless access to such personal data
storage devices in from of row data. Data mining is causes a peril to individual privacy[9]. Recent
the process of discovering interesting pattern and research in the area of privacy preserving data mining
knowledge from large amount of data. Following has considerate effort to determine a trade-off
Example where data mining techniques are used are between privacy and the need for knowledge
Direct mail marketing, bioinformatics, credit card discovery, which is necessary in order to improve
fraud detection , text analysis and market basket decision-making processes and other human
analysis. Extracting knowledge from row data, There activities. PPDM cope with the problem of learning

978-1-4799-6818-3/15/$31.00 © 2015 IEEE


IEEE Sponsored 2nd International Conference on Innovations in Information Embedded and Communication Systems ICIIECS'15

accurate models over aggregate data, while protecting Confidence of rule A-B is the transaction database
privacy at the level of individual records. that contain A also contain B. The confidence for rule
(A-B) can be calculated using below formula in (2).
The main purpose of privacy preserving data
IAUBI
mmmg is to design competent frameworks and confidence(A-B) = -- (2)
A
algorithms that can extract relevant knowledge from
a large amount of data without revealing of any IV. ASSOCIATION RULE HIDING
sensItIve information[9]. It protects sensitive
information by providing sanitized database of Association rule hiding is one technique to
original database on the internet or a process is used PPDM(Privacy Preserving Data Mining). Association
in such a way that private data and private knowledge rule hiding methodology aim is to sanitized the
remain private even after the mining process. It is original data. so it may be applied to following
PPDM due to which the benefits of data mining be condition:
enjoyed, without compromising the privacy of
(1) sanitized database is not reveal any sensitive
concerned individuals.
rules.
III. ASSOCIATION RULE MINING
(2) sanitized database is mining of all non-sensitive
rules.
Association rule mining one of the task of
the data mining. Association rule mining is important
(3) sanitized database is not add any new rules, not
field to under privacy preserving data mining. R.
present in database D.
Agrawal was first proposed the basic concept of the
Association rule mining. Association rule is basically Association rule hiding is the depend on
using the concept of IF-THEN relationship among support and confidence of the rule, There is two way
the different data. following example of shows the to hide any rule (i) Decrease support up to certain
concept of the Association rule. "If the customer buy threshold. (ii) Decrease confidence up to certain
a laptop ,then he/she is 85% likely to also purchase threshold.
anti-virus ". Analysis of the above example that
laptop is somewhat related to anti-virus because
every time customer buy a computer then he/she buy
anti-virus. Association rule is used for market basket OrigIDal
analysis. Let 1={II,I2,...,In} be a set of item. Let D be database
a database of transactions where each transaction T is
a set of item such that T !:;; I. For every transaction is
associated to an identifier, called TID. A transaction Frequent Associatio,n
T is contain A if and only if A!:;; T. An association item mining RDlehiding
rule is applied of the form A-B. where AQ, BQ. algorithms
and AnB = <1>. every association rule must be satisfy
two contain support and confidence. Generation of Associ­
Association ation
Support of rule A-B is the transaction database that Association rules
ruies rules
contain support count of AUB. support for rule
(A-B) can be calculated using below formula in (1).

IAUBI Figure] - Basic Architecture of rule Hiding


support(A-B) =-­ (1)
D
In figure shows the framework of
where D is the total number of transaction in association rule hiding. In that modification of the
transaction database. database may causes some side effect. following side
effect may occurs in the database.
IEEE Sponsored 2nd International Conference on Innovations in Information Embedded and Communication Systems ICIIECS'15

Lost rule: there are non sensItIve rule in original there are D's and I's must be hidden during blocking,

database it can be mined by mining algorithms but it because D's or 1 's replace with " ? ". In some applications
where publishing wrong data is not acceptable, then
can not mining after using hiding algorithm from
unknown values may be inserted to blur the rules. so, that
modify database
support of certain items goes down to certain level and rule
mining algorithm nit able to mine the sensitive rules[3].
False rule: The sensitive rule can not hidden by
hidden algorithm and can be mined by using mining Advantages: It Maintain database , instead of inserting
algorithm for modify database. false value to block the original value.

Ghost rule: this rule are not present In original Disadvantages: Difficult to reproduce the original
database but generate using the hiding algorithm. database. and it is the various side effect like lost rule,
ghost rule, false rule etc.
Association rule hiding technique can be classified
into heuristic based approaches, reconstruction based (2) Border Based Approaches:
approaches, border based approaches, exact
approaches, reconstruction based approaches and Border based approach is hide the sensitive
association rules by modifying the border in the
cryptography based approaches.
lattice of the frequent and the infrequent item sets in
(1) Heuristic based approaches: original database[3]. this approach is make the border
between the frequent and infrequent items. that way
Heuristic based approach aIm for taking this border is divided the frequent and in frequent
locally best decisions with respect to hiding the item sets. The first frequent item set hiding
sensitive knowledge which, however, which are not methodology that is based on the notion of the
necessary globally best. In heuristic result, heuristic border[l]. It maintains the quality of database by
fail to provides guarantees to identified hiding greedily selecting the modifications with minimal
solution. This approach is fast, efficient and scalable side effect.
algorithm to select the sanitize data set in which set
of transaction from database to hide the association Advantages: It maintain the database quality by selecting
the modification with the minimal side effect.
rule.[l]

Disadvantages: The border is not easily identify. then it is


Two types of heuristic approach is used: (i) Data
difficult to understand based on the heuristic approaches.
distortion. (ii) Data Blocking.
(3) Exact Approaches:
(i) Data distortion: This type is done by alteration of old
attribute value to new value. It changes 1 's to D's or vice
Another name of the exact approach is the
versa in selected transactions to increase or decrease
non-heuristic algorithm which is formulated to
support or confidence of sensitive rule[ I,2] . In heuristic
approach give optimum solution because of some side
constrain satisfaction problem(CSP) and solve by
effect to non sensitive rule. using the binary integer programming(BIP). It
Provide the optimal solution to all constrain. In [4] is
two basic technique for data is reduced the confidence of first used the exact approach for hide the rules. and it
rule and reduce the support of the rule. example
provides an optimal solution of rule hiding problem.
explain........ the problem is finding an optimum solution
In [7] to hide sensitive rules by formulating
using NP-hard. Proposes algorithm using the heuristic
constraint satisfaction problem without any side
based data distortion technique.
effects with the concepts of positive and negative
Advantages: It is a more efficient, fast and scalable. border sets. By using adopting divide and conquer
technique on constraints.
Disadvantages: This approach is difficult to Handel the
changes in Database. Advantages: it gives guarantees to provides the optimal
solution without any side effect.
(ii) Data Blocking: This technique is using the maximum
confidence or not reduce the sensitive rule. In database
IEEE Sponsored 2nd International Conference on Innovations in Information Embedded and Communication Systems ICIIECS'15

Disadvantages: the approach is require the high There are following paper refer as the
complexity due to the binary integer programming. literature survey. This all paper included the
advantages and limitation.
(4) Reconstruction Based Approaches:
Dharmendra Thakur, Prof. Hitesh Gupta
Reconstruction based approaches is generate
[9], they present various association rule hiding
privacy for database by using sensitive characteristic
algorithm. And they work on the heuristic based
from the original database. it produce lesser side
algorithm because this approach is efficient, fast
effect in database. In [5] it define the FP tree based
algorithm to hide the sensitive knowledge. This paper
algorithm which reconstruct the original database and
are using the support based algorithm and confidence
efficient generate number of secure database.
based algorithm.
Advantages: Reconstruction based approaches is create the
Pallavi Dubey[ll], she is solve the huge
privacy of database and lesser side effect than heuristic
based approaches.
storage and its management. she is processing to
increase throughput of output data over distributed
Disadvantages: The problem is number of transaction is database. she conclude that to improve response time,
restricted in new database. number of processor in distributed environment.

(5) Cryptographic Based Approaches: R.Natarajan, Dr.R.Sugumar, M.Mahendran,


K.Anbazhagan[15], They using the privacy
Cryptographic based approaches used in
preserving data mining to provide the confidentiality
multi-party computation .In which data in distributed
and improve the performance of at the time when
from different location. The owner of the database is
database stores and retrieves huge amount of data.
want to share their data, and at the same time they
they present ISL(Increase support of L.H.S) and
also want the privacy at their end. cryptographic
DSR(Decrease support of R.H.S).In which DSR
Based approaches can be classified two way that is
algorithm to hide useful association rule from
Horizontal partition distributed data and vertical
transactions data with binary attributes. and ISL
partition distributed data.
algorithm confidence of a rule is decreased by
In horizontal partition distributed data is increasing the support value of LHS of the rule. they
provide the different rows are placed in different conclude that using this two algorithm only the 19%
tables that are distributed in different locations. In efficiency increase.
vertical partition data some column keep in one table
Chirag N. Modi, Udai Pratap Rao and
and remaining column in another table. In [6] is
Dhiren R Patel[2], They proposed heuristic algorithm
define the basic rule of the association rule over
its name was DSRRC which is hide rule only the
horizontal partition distributed database. In [8] is
certain level. the heuristic algorithm better used then
efficiently mine association rules over vertically
the other hiding algorithm. DSRRC algorithm is hide
partitioned data. they introduce a partial topology to
rule that rule contain single item on R.H.S of the rule.
lower communication cost as much as possible.
Empirical results. Komal shah, Amit Thakkar, Amit
Ganatra[I], they proposed two algorithm ADSRRC
Advantages: It provide the security in multi-party
and RRLR to hiding sensitive association rule. this
computation over the partition database.
two algorithm are overcome the limitation of DSRRC
Disadvantages: It does not provide the security of the algorithm. ADSRRC overcomes limitation of
output of a computation. Communication and Computation multiple sorting in database as well as it selects
cost should be low. transaction to be modified based on different criteria
than DSRRC algorithm. Algorithm RRLR overcomes
V. RELATED WORK
limitation of hiding rules having multiple R.H.S.
items. they also reduce the side effect and
complexity.
IEEE Sponsored 2nd International Conference on Innovations in Information Embedded and Communication Systems ICIIECS'15

Vikram Garg, Anju Singh, Divakar Step: 2 - Algorithm Finds IS = {isO, isl, ... isn}
Singh[12], Data mining is extract the hidden k<=n, By arranging those in decreasing order of their
predictive information from the data warehouse counts.
without revealing their sensitive information. they
show all PPDM(privacy-preserving data mining) Step: 3- Calculate Sensitivity of each Transaction.
technique and also show the recently research area to
Step: 4 - The Transaction which support isO are
deal with the association rule hiding. Show the
shorted in decreasing order of their sensitivity.
comparative analysis of the all association rule hiding
approaches. Step: 5 - In Shorted Transaction which is having
higher Sensitivity, delete item isO from the
J. Arokia Renjit, Dr .K.L
transaction.
.Shunmuganathan[16], Data mining algorithm are
used to centralized algorithm. in that centralize data Step: 6 - Update support and confidence when delete
mining algorithm to discover useful patterns in item of transaction. Also update the database.
distributed databases. in that the distributed database
is not always feasible because merging data sets from
different sites incurs huge network communication
costs. they are using the FDM algorithm fro This following figure show the proposed
geographically Distributed Data sets, and use flow of the work.
effective pruning technique to minimized the
I �
I I I
Database(D) Apply

nl
Combme me
message for the support. conclude that mining
association rules in distributed databases which is
res.ult Association

rule II
reduced communication , efficiency, and scalability
1 Distributed T 1
Shorting item
then communication of sequential algorithm in Divide Database
According to
-i
Distributed database. i.e. ( 01 ,02,03) sensitivity

Up ate

Nikunj H. Domadiya, Udai Pratap Rao[lO],


this paper they proposed the MDSRRC algorithm to
1
Apply Apriory Algorithm &

Generate Association rule


Se sitivity

--11
1
Rul. Hiding
of all divided DB

1
hide the sensitive association rule with multiple items
in consequent (R.H.S) and antecedent (L.H.S). They � Update

Select any one fro m DB &

I
overcome the problem of DSRRC algorithm. my Combine with the
Upd ,te
Transaction DB
proposed work is related to this paper. they also show remaining

the comparative analysis of the DSRRC algorithm 1


and MDSRRC algorithm in which they shoe the The result will share with

aliOS using
MDSRRC algorithm is increase the efficiency of the Syn c hronization (using

message passing)
DSRRC algorithm.

Figure2- Proposed Flow


VI. PROPOSED WORK
2. Description Of Algorithm(Proposed)
1. Description OfAlgorithm (Existing)
There are following step of the MDSRRC Algorithm
There are following steps of the MDSRRC in Distributed Database:
Algorithm:
Step: 1 - Take the Large Database. because using the
Step: 1 - Count the Occurrence of each item in R.H.S distributed database.
of Sensitivity.
Step: 2 - Concept of the distributed database are they
are divided(i.e Dl,D2,D3)
IEEE Sponsored 2nd International Conference on Innovations in Information Embedded and Communication Systems ICIIECS'15

Step: 3 - Apply the Association rule and Generate the VIII. FUTURE WORK
Association rule of all dividing Database.
Implementation of the proposed work is still under
Step: 4 - suppose select the Database Dl then progress and is left as future work.
combine the remaining two database D2,D3.

Step: 5 - Database Dl result is there then share with REFERENCES


all database D2,D3 using synchronization with
message passing technique. [ 1 ] Komal shah, Amit Thakkar, Amit Ganatra," Association
Rule Hiding by Heuristic Approach to Reduce Side Effects

Step: 6 - Combine All the result that is the generated & Hide Multiple R.H.S. Items " International Journal of

Association rule. Computer Applications (0975 - 8887) Volume


45- No.1 , May 20 1 2.

Step: 7 - Count the Occurrence of each item in R.H.S


[2] Chirag N. Modi, Udai Pratap Rao, Dhiren R. Patel,
of Sensitivity.
" Maintaining Privacy and Data Quality in Privacy
Preserving AssociationRule Mining" Second International
Step: 8 - Algorithm Finds IS = {isO, isl, ... isn}
conference on Computing, Communication and
k<=n, By arranging those in decreasing order of their
Networking Technologies,20 I O.
counts.
[3] Tapan Sirole and Jaytrilok Choudhary, "A Survey of
Step: 9- Calculate Sensitivity of each Transaction. Various Methodologies for Hiding Sensitive Association
Rules" International Journal of Computer Applications
Step: 10 - The Transaction which support isO are (0975 - 8887) Volume 96- No.18, June 20 1 4.
shorted in decreasing order of their sensitivity.
[4] A. Gkoulalas-Divanis, and V. S. Verykios, " Exact
Step: 11 - In Shorted Transaction which is having knowledge hiding through database extension" IEEE Trans
higher Sensitivity, delete item isO from the Knowledge Data Eng 2009, pp. 699-7 1 3.

transaction.
[5] Y. Guo, " Reconstruction-Based Association Rule

Step: 12 - Update support and confidence when Hiding," In Proc. of SIG MOD2007 Ph.D. Workshop on
Innovative Database Research 2007(ID A R2007), June
delete item of transaction. Also update the database.
2007.

VII. CONCLUSION
[6] Tamir Tassa; "Secure Mining of Association Rules in
Horizontally Distributed Databases" ieee transactions on
MDSRRC algorithm is hide the sensitive association
knowledge and data engineering, vol. 26, no. 4, april 20 1 4.
rule with few modification on database. Which
maintain the data quality and reduced the side effect [7] N V Muthu Lakshmi and Dr. K Sandhya Rani, " An
of database. I will try to extend the MDSRRC improved algorithm for hiding sensitive association rules

algorithm in distributed database. I will try to make using exact approach" I RACST - International Journal of

association rule more secure. And I will also try to Computer Science and Information Technology & Security
(IJCSITS), I S S N: 2249-9555 Vol. 2, No. 1 , 20 1 2.
the MDSRRC algorithm, increase the efficiency and
reduced the side effect by minimizing the [8] Due H. Tran, Wee Keong Ng and Wei Zha,
modification on database. and will used distributed "CRYPP AR: An Efficient Framework for Privacy
database to hide the association rule. Comparison of Preserving Association Rule Mining over Vertically
DSRRC,MDSRRC in single database, and my Partitioned Data" 978- 1 --4244--4547-9/09/$26.00 c 2009

proposed work MDSRRC in distributed database IEEE.

Will also be included.


[9] Dharmendra Thakur, prof. Hitesh Gupta, " An
Exemplary Study of Privacy Preserving Association Rule
Mining Techniques" International Journal of Computer
Science and software engineering Volume 3, Issue 1 1 ,
November 20 1 3.
IEEE Sponsored 2nd International Conference on Innovations in Information Embedded and Communication Systems ICIIECS'15

[ 1 0] Nikunj H. Domadiya, Udai Pratap Rao, " Hiding [ 1 6] 1. Arokia Renjit, Dr.K.L.Shunmuganathan, " Mining
Sensitive Association Rules to Maintain Privacy and Data the data from distributed database using an improved
Quality in Database " 3rd IEEE InternationalAdvance mining algorithm ", International Journal of Computer
Computing Conference IEEE 20 1 3. Science and Information Security, Vol. 7, No. 3, March
20 1 0.
[II] Pallavi Dubey, " Association Rule Mining on
Distributed Data ", International Journal of Scientific &
Engineering Research, Volume 3, Issue I, January-20 1 2.

[ 1 2] Vikram Garg, Anju Singh, Divakar Singh, "


A Survey of Association Rule Hiding Algorithms", Fourth
International Conference on Communication Systems and
Network Technologies,IEEE 20 1 4.

[ 1 3] J. Arokia Renjit, Dr.K.L.Shunmuganathan, " Mining


the data from distributed database using an improved
mining algorithm ", International Journal of Computer
Science and Information Security, Vol. 7, No. 3, March
20 1 0.

[ 1 4] Nikunj H. Domadiya , Udai Pratap Rao , " Hiding


Sensitive Association Rules to Maintain Privacy and Data
Quality in Database " 3rd IEEE InternationalAdvance
Computing Conference, IEEE 20 1 3.

[ 1 5] R.Natarajan, Dr.R.Sugumar, M.Mahendran,


K.Anbazhagan, " Design and Implement an Association
Rule hiding Algorithm for Privacy Preserving Data Mining
", international Journal of Advanced Research in Computer
and Communication Engineering Vol. I, Issue 7,
September 20 1 2.

You might also like