Professional Documents
Culture Documents
Application of Association Rule Hiding On Privacy Preserving Data Mining
Application of Association Rule Hiding On Privacy Preserving Data Mining
73 www.erpublication.org
Application of Association Rule Hiding on Privacy Preserving Data Mining
After the cluster analysis end users may use the tools of their association rule mining, following are the side effects that
own so that constraint on privacy can be applied before may occur in the rule hiding process:
mining process by data transformation. Lost Rules: these are the non-sensitive association rules
Disadvantages which are present in original database and are mined by
The data shared after privacy cluster analysis is very applying the mining algorithm but they cannot be mined after
complex. applying hiding algorithm from modified database.
Clustering the similarity between objects which are under False rules: these are the sensitive association rules which
analysis for privacy is hard to achieve. cannot be hidden by hiding algorithm and can be mined by
applying mining algorithm on modified database.
B. Perturbation Based Privacy Preserving Data Mining
Ghost rules: the rules which are not at all present in the
For Real World Data [5]
original database but are generated after the application of
Perturbation based method is a technique where random noise hiding algorithm.
from a known distribution is added to the sensitive data before
the data is sent for the mining process. Consequently, the data IV. ASSOCIATION RULE HIDING STRATEGIES
miner rebuilds an approximation to the original data
distribution from the perturbed data and the reconstructed The association rule hiding strategies are classified into the
distribution is being used for data mining purposes. following:
Individuals or organizations of different type may have A. Heuristic based approach
different approaches towards privacy preserving. This approach is further classified into two methods for
Unfortunately, recent privacy preserving data mining sensitive association rules.
techniques based on perturbation do not allow the individuals i. A technique based on distortion of data
with full freedom to choose the desired levels of privacy. This technique was first used in hiding association rules.
Proof for optimal sanitization of NP hard problems can be
Advantages derived by using this technique [7]. In this process the
Efficient and simple data mining models can be made from modification of the database matrix is used to hide the rules,
perturbed data. which is done by replacing the values of some items in the
The data miner could rebuild the original distribution using database matrix by 0 or 1. This technique of data distortion
various statistical models and mine the rebuilt data once the contains two basic methods which are used for rule hiding.
distribution of added noise is known. In the first method rules are hidden by the process of
Disadvantages decreasing the support of the rule to a level which is
Individuals are unable to choose their desired privacy levels acceptable and on the second method the reduction of the
due to data mining techniques based on perturbation. confidence level is done up to a threshold which is agreed.
In perturbation based approaches there is generally a trade Table 1 shows an example of distortion technique of basic
off in information loss versus preservation of privacy after the data.
addition of noise There are five different technique used for association rule
hiding. They are based on either reducing support or
III. ASSOCIATION RULE HIDING decreasing the confidence level.
The association rule hiding methodologies sanitizes the Table 1: Hiding P->Q by Distortion
database in such a way so that at least one of the following
goals is accomplished [6].
A. The rule which is considered as sensitive from the owners
perspective and cannot be mined from the database at
pre-specified thresholds of support and confidence and also
data can be revealed from the sanitized database, when the
database is mined at the same or at higher thresholds.
B. All the rules that appeared non-sensitive during the mining
the original database at pre-specified thresholds of support ii .Technique based on data blocking
and confidence can be successfully mined successfully from Under this process the sensitive association rules are hidden
the sanitized database at the same thresholds or higher. by data blocking technique. In this process the rules are
C. The rules which were not derived from the original hidden by altering the values of items present in the database
database when mined at pre-specified thresholds of matrix to an unknown value [*] from 0 or 1. Hence the support
confidence and support cannot be derived from its sanitized of some of the items present in the database matrix goes down
database when it is mined at the same or at higher threshold. from the existing level and the rule mining algorithm may not
D. Association rule hiding is a process that is totally be able to accurately mine the sensitive rules. Table 2 shows
dependent on the support or confidence of the rule, there are an example of basic data blocking technique.
two way to hide any rule, either decrease the support up to Table 2: Hiding P->Q by Blocking
certain threshold or decrease confidence up to certain
threshold, so the mining algorithm, that works on support will
not able to mine sensitive rules with the same efficiency.
However with the modification performed on the database it
may lead to some side effect that may results to disturbance in
74 www.erpublication.org
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869 (O) 2454-4698 (P), Volume-4, Issue-4, April 2016
VI. SYSTEM ARCHITECTURE
This figure shows the architecture of our overall process
which deals with the strategy of association rule hiding.
Association rule hiding is a strategy which comes under
privacy preserving data mining technique whose aim is to
prevent the unlawful disclosure of its data. In this process, the
administrator has full access to its database. The changes in
the database can be made only by the administrator. The
B. An approach using border revision
administrator can make changes or access the database only
In this approach the database items are classified into using conditional queries. Once the database has been created
frequently and infrequently used item sets based on which the the data further goes through a process of data pre-processing
borders are modified in the lattice to hide the sensitively which deals with the data selection (to retrieve data from
associated rules [8]. This process finds the items sets which database relevant to us). The data pre-processing consists of
are non-sensitive but frequently used and applies four steps.
modifications based on the order in which the items sets are Step 1: Data cleaning: to remove noise and
tracked and which has minimal impact on the quality to inconsistent data, to handle the missing data elds,
accommodate the hiding of sensitive rules. There are etc. and data integration (to combine data from
numerous border revision approach based algorithms like multiple sources).
Border Based Approach, Min-Max1 and Min-max2 to hide Step 2: Data integration: to combine data from
sensitive association rules. These Algorithms uses methods multiple sources.
such as deleting a specific sensitive item and reduces the loss Step 3: Data transformation: transforms the data into
in items sets which are non-sensitively associated while forms appropriate for data mining task and also finds
storing it in a sanitizes database for protecting sensitive rules. useful features in it to represent the data.
C. The exact approach Step 4: Data reduction: the process of removing
The exact algorithm is an approach based on heuristic irrelevant features in the data so as to reduce space
function which formulates the process of hiding as a and also make it efficient for data mining purpose.
constraint satisfaction problem. This approach is a descendant
of the border based methodology and is solved by integer
programming.
V. PROPOSED TECHNOLOGY
The PPDM is a process to safeguard sensitive information
from unsolicited or unsanctioned disclosure and preserve the
utility of data. Our aim is to overcome these privacy issues in
data mining using certain privacy preserving data techniques.
Association rule hiding emerge as the best possible solutions.
We will generate strong associations based on the data which
is present in the database. Apart from the strong associations
there will be some data which have formed weak associations
and our focus mainly relies on hiding those data exhibiting
weak associations as they can lead to privacy issues. The aim
of a secure multiparty computation task is for the participating
parties to securely compute some function of their distributed Fig.1 Architecture of Association Rule Hiding
and private inputs. Each party learns nothing about other Once the data has been passed through Data pre-processing
parties except its input and the final result of data mining steps it has transformed into the format of data mining.
algorithm. Association rule mining helps in finding interesting
relationships between the data using certain associations
A.Mathematical Model between the respective data. Association rules using support
and confidence as defined in our mathematical model give us
Association rules by the use of support and confidence can be
only the strong associations on the data. As the strong
defined as follows. Let I= {I1, I2,.Im} be a set of items.
associations on the data has been found out there will certain
Let DB= {Z1,Z2,...Zn} be a set of transactions. where
data which are weakly associated with each other. Thus the
each transaction Z in DB is a set of items such that Z I an
idea is to use Association rule hiding to those sensitive rules
association rules of implication in the form of X Y, where
and sensitive items which help in hiding the pattern which is
X Z, Y Z and X Y= .
found in the sensitive data which are weakly associated with
Support (A->B) = (A U B)/n min _ support
each other, hence providing a better result since the datasets
Confidence (A->B) = Support (A U B)/ Support (A) min _
are not suppressed or falsified much in the process. The rules
confidence
generated tries to modify transaction until the confidence of
Let DB be the database after applying a sequence of
the rule fall below minimum confidence or minimum support
modification to DB. A strong rule XY in DB will be hidden
level agreed by both the parties. The changes to the datasets
in DB if one of the following condition holds in DB.
are done by either removing items from the transaction or
1. Supp XY < MS
inserting falsified new items to the transactions. The process
2. Confidence XY < MC
75 www.erpublication.org
Application of Association Rule Hiding on Privacy Preserving Data Mining
is further improved by restricting patterns found in the data by 2. Generate all possible rule from given items Z;
generating association rules at the time of transaction. 3. Compute confidence for all the rules hidden item H,
compute confidence of each rule R.
4. For each rule R in which H is in RHS
The rule mining process first selects the transactions that
4.1 If confidence (R) < MC, then
contain the intersecting patterns from a group of restricted
Go to next 2-itemset;
patterns and depending on the decided threshold value given
Else go to step 5
by users the algorithm sanitizes a percentage of the selected
5. Decrease Support for RHS item H.
transactions and stores it in a sanitized database in order to
5.1 Find Z=z in DB fully support for R;
hide the restricted patterns. Then the rule hiding approach
5.2 While (Z is not empty)
helps in modifying the database such that confidence level of
5.3 Choose the first transaction z from Z;
the association rule can be altered by increase or decrease the
5.4 Modify z by putting 0 instead of 1 for the RHS item;
support value based on the relationship between the data
5.5 Remove and save the first transaction z from Z; End While
exchanged between the parties. If the confidence of the rule
6. Compute the confidence of R;
falls below a specified threshold on a particular side the data
is hidden or not disclosed. The association rule hiding
discussed is not applied to data which are strongly associated
VIII. CONCLUSION
or data which do not contain sensitive information and such
numeric or symbolic attributes. In this paper, a wide survey is conducted for different
approaches of privacy preserving data mining, and analyses
VII. PROPOSED ALGORITHM the major algorithms available for each method and points out
the existing drawback. While all the purposed methods are
A. Design of association rule hiding algorithm only approximate to our goal of privacy preservation.
The main objective of association rule hiding algorithm is to To address this issue, we advise that the following problems
hide certain confidential data so that they are not discovered should be widely studied:
through the data mining techniques. In this research work, it is (A) Privacy and accuracy is a pair of contradiction;
assumed that only sensitive items are given and one proposed improving one usually incurs a cost in the other. How to apply
algorithm is used to modify data in database so that sensitive various optimizations to achieve a trade-off should be deeply
items cannot be deduced through association rule mining researched.
algorithms. More specifically, given a transaction database (B) In distributed privacy preserving data mining areas. We
DB, a Min_supp( minimum support), a Min_confid are tried to develop more efficient algorithms and achieve a
(minimum confidence) and a set of hidden items H , the balance between disclosure cost, computation cost and
objective is to modify the database DB such that no communication cost.
association rules containing H will be discovered on the right
hand side or left hand side. REFERENCES
The two proposed association rule hiding algorithms are used
namely ISL (Increase Support of Left hand side) and DSR [1] Neelkamal Upadhyay,KuldeepTripathi and Prof. Ashish Mishra, A
(Decrease Support of Right hand side) so as to hide useful Survey of Association Rule Hiding Approaches, IRACST -
International Journal of Computer Science and Information
association rule from transaction data with the use of binary Technology & Security (IJCSITS), ISSN: 2249-9555 Vol. 5, No1,
attributes. In ISL method, confidence of a rule is further February 2015.
decreased by increasing the support value of Left Hand Side [2] https://en.wikipedia.org/wiki/Data_mining
(LHS) of the rule. [3] Alberto Trombetta and Wei Jiang (2011), Privacy-Preserving
Updates to Anonymous and Confidential Databases, IEEE
Based on these two concepts, a new association rule hiding Transactions on Knowledge and Data Engineering, Vol. 22, pp.
algorithm for hiding sensitive items in association rules has 578-568.
been proposed. In the proposed algorithm, a rule X Y is [4] Stanley R. M. Oliveira, and Osmar R. Zaiane, "Revisiting Privacy
hidden by decreasing the support values of X U Y and Preserving Clustering by Data Transformation," Journal of
Information and Data Management, vol. 1, no. 1, 2010.
increasing the support value of H which can increase and [5] Li Liu, Murat Kantarcioglu and BhavaniThuraisingham, "The
decrease the support values of the LHS and RHS item of the applicability of the perturbation based privacy preserving data mining
rule correspondingly. for real-world data," Journal of Data & Knowledge Engineering, vol.
This algorithm tries to hide the rules in which item to be 65, pp. 521, 2008.
[6] M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim, and V.
hidden i.e., H is in right hand side and then tries to hide the
S.Verykios Disclosure limitation of sensitive rules.In Proc. of the
rules in which H is in left hand side. For this algorithm t is a 1999 IEEE Knowledge and Data Engineering Exchange Workshop
transaction, T is a set of transactions, R is used for rule, RHS (KDEX99), pp. 4552, 1999.
(R) is Right Side of rule R, LHS (R) is the Left Side of the [7] V.S. Verykios, A. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni,
rule R, Confid (R) is the confidence of the rule R, a set of Association rule hiding, IEEE Transactions on Knowledge and
Data Engineering, Vol. 16, No.4, 434447, 2004
items H which are to be hidden. [8] A. Gkoulalas-Divanis, and V. S. Verykios, Exact knowledge hiding
INPUT: A source database DB, A minimum support through database extension IEEE Trans Knowledge Data Eng 2009,
min_supp (MS), a minimum confidence min_confid (MC), a pp. 699713
set of hidden items H, total transaction Z, first transaction
from total z.
OUTPUT: The sanitized database DB, where rules
containing X on LHS or RHS will be hidden.
1. Begin
76 www.erpublication.org