Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Association Rule Hiding Methods

set of sensitive rules by replacing known values by Another study (Menon, Sarkar & Mukherjee 2005)
unknowns, while minimizing the side effects on non- was the first to formulate the association rule hiding A
sensitive rules. Note here that the use of unknowns problem as an integer programming task by taking
needs a high level of sophistication in order to perform into account the occurrences of sensitive itemsets in
equally well as the perturbation approaches that we the transactions. The solution of the integer program-
presented before, although the quality of the datasets ming problem provides an answer as to the minimum
after hiding is higher than that in the perturbation ap- number of transactions that need to be sanitized for
proaches since values do not change behind the scene. each sensitive itemset to become hidden. Based on
Although the work presented under this category is the integer programming solution, two heuristic ap-
in an early stage, the authors do give arguments as to proaches are presented for actually identifying the
the difficulty of recovering sensitive rules as well as items to be sanitized.
they formulate experiments that test the side effects A border based approach along with a hiding al-
on non-sensitive rules. Among the new ideas which gorithm is presented in Sun & Yu (2005). The authors
were proposed in this work, is the modification of propose the use of the border of frequent itemsets to
the basic notions of support and confidence in order drive the hiding algorithm. In particular, given a set
to accommodate for the use of unknowns (think how of sensitive frequent itemsets, they compute the new
an unknown value will count during the computation (revised) border on which the sensitive itemsets have
of these two metrics) and the introduction of a new just turned to infrequent. In this way, the hiding algo-
parameter, the safety margin, which was employed in rithm is forced to maintain the itemsets in the revised
order to account for the distance below the support or positive border while is trying to hide those itemsets in
the confidence threshold that a sensitive rule needs to the negative border which have moved from frequent
maintain. Further studies related to the use of unknown to infrequent. A maxmin approach (Moustakides &
values for the hiding of sensitive rules are underway Verykios, 2006) is proposed that relies on the border
(Wang & Jafari, 2005). revision theory by using the maxmin criterion which is a
method in decision theory for maximizing the minimum
Recent Approaches gain. The maxmin approach improves over the basic
border based approach both in attaining hiding results
The problem of inverse frequent itemset mining was of better quality and in achieving much lower execu-
defined by Mielikainen (2003) in order to answer the tion times. An exact approach (Gkoulalas-Divanis &
following research problem: Given a collection of Verykios 2006) that is also based on the border revision
frequent itemsets and their support, find a transactional theory relies on an integer programming formulation of
database such that the new database precisely agrees the hiding problem that is efficiently solved by using a
with the supports of the given frequent itemset collec- Binary Integer Programming approach. The important
tion while the supports of other itemsets would be less characteristic of the exact solutions is that they do not
than the predetermined threshold. A recent study (Chen, create any hiding side effects.
Orlowska & Li, 2004) investigates the problem of us- Wu, Chiang & Chen (2007) present a limited side
ing the concept of inverse frequent itemset mining to effect approach that modifies the original database to
solve the association rule hiding problem. In particular, hide sensitive rules by decreasing their support or con-
the authors start from a database on which they apply fidence. The proposed approach first classifies all the
association rule mining. After the association rules valid modifications that can affect the sensitive rules,
have been mined and organized into an itemset lattice, the non-sensitive rules, and the spurious rules. Then,
the lattice is revised by taking into consideration the it uses heuristic methods to modify the transactions in
sensitive rules. This means that the frequent itemsets an order that increases the number of hidden sensitive
that have generated the sensitive rules are forced to rules, while reducing the number of modified entries.
become infrequent in the lattice. Given the itemsets Amiri (2007) presents three data sanitization heuristics
that remain frequent in the lattice after the hiding of that demonstrate high data utility at the expense of com-
the sensitive itemsets, the proposed algorithm tries to putational speed. The first heuristic reduces the support
reconstruct a new database, the mining of which will of the sensitive itemsets by deleting a set of supporting
produce the given frequent itemsets. transactions. The second heuristic modifies, instead of



You might also like