Professional Documents
Culture Documents
P151 ReTIS2015
P151 ReTIS2015
net/publication/279923590
CITATIONS READS
19 2,406
2 authors:
All content following this page was uploaded by Swarup Roy on 11 July 2015.
Abstract—Association rule mining (ARM) techniques are ef- discuss the background of ARM and QARM techniques in the
fective in extracting frequent patterns and hidden associations following section.
among data items in various databases. These techniques are
widely used for learning behavior, predicting events and making II. BACKGROUND
decisions at various levels. The conventional ARM techniques
are however limited to databases comprising categorical data An explicit definition of the ARM problem was given by
only whereas the real-world databases mostly in business and Agrawal et al. [2]. From the perspective of analyzing market-
scientific domains have attributes containing quantitative data. basket data; for a set of items I = {I1 , I2 , I3 , ....., Im } and
Therefore, an improvised methodology called Quantitative As- a database of transactions D = {t1 , t2 , t3 , ...., tn } where
sociation Rule Mining (QARM) is used that helps discovering
hidden associations from the real-world quantitative databases.
each transaction ti ⊆ I ; the ARM problem is to identify
In this paper, we present an exhaustive discussion on the trends all interesting Association Rules (AR) of the form X⇒Y
in QARM research and further make a systematic classification where X ⊂ I, Y ⊂ I are actually the non-empty frequent-
of the available techniques into different categories based on itemsets and X ∩ Y = φ. All such AR, satisfying either or
the type of computational methods they adopted. We perform a both the two thresholds of rule interestingness viz. minimum
critical analysis of various methods proposed so far and present support and minimum confidence; are considered relevant and
a theoretical comparative study among them. We also enumerate
some of the issues that needs to be addressed in future research. informative [3]. Support of an association rule is calculated
as the percentage of records containing X ∪ Y to the total
Index Terms—Association Rules; Quantitative Association number of records in the database. Confidence is the strength
Rules; Clustering; Fuzzy; Evolutionary approach; Information of the implication in a rule; it is calculated as the percentage
theory. of records that contain Y if they contain X.
QARM is an improvised form of association rule mining
I. I NTRODUCTION applied to databases containing both quantitative and categor-
ical attributes. Alike as an AR, a Quantitative Association Rule
With the passage of time and advances in technology, (QAR) is formally an implication of the form X ⇒ Y. The left
the nature and volume of data changed remarkably. Due side of the implication is called antecedent and the right side is
to numerous information sensing and information gathering called succedent (also, consequent). Both these cedents (also
devices and techniques, the modern databases are no more called attribute sets) may be comprised of a single attribute
alike the market-basket databases where classical association trivial cedent, or multiple attributes non-trivial cedent. Unlike
rule mining techniques can be applied. Most modern databases a conventional AR, in a QAR both quantitative and categorical
(in domains like business, health-care, stock-market, bioinfor- attributes may participate as any of the cedents and the
matics etc.) are larger in size, dimensions and contain attributes common measures of rule interestingness i.e. support and
that are quantitative in nature. Therefore, to handle quanti- confidence can be applied to it. Other measures available for
tative data and mine meaningful association rules in multi- evaluating the quality of a QAR are Lift, Leverage, Conviction,
dimensional quantitative databases better ARM techniques Gain, Certainty-Factor etc. which are widely discussed in [4]
became a necessity. Henceforth, the concept of Quantitative and [5]. If both the left and right side of a QAR has trivial
Association Rule Mining was coined. Then onwards a good cedent, it is called single dimensional QAR else it is multi-
number of QARM techniques has been proposed in recent dimensional. An example of a multi-dimensional QAR is given
decades. These techniques follow different trends w.r.t. time below:
and need, and have their own advantages and limitations when
Salary [40k · · · 50k] ∧ M arried : [Y es] ⇒ N umLoans : [2] (1)
applied in different quantitative databases.
In this paper, we try to articulate various trends followed In this QAR, Salary and NumLoans are Quantitative at-
in QARM research since the inception of the concept in the tributes whereas Married is a nominal Categorical attribute
pioneering work by R. Srikant et al. [1]. We present a com- (with categories Yes & No). It is a positive QAR as the
prehensive study of various QARM approaches and analyze consequent is positively correlated with the antecedent. Sim-
a few techniques from each one of them. To begin with, we ilarly, negative rules represented as X ⇒ ¬Y may also be
127
rules problem because reasonable range or interval detection straightforward while the complex ones viz. [12], [13] and [14]
is a challenge in partitioning. Majority of the research works uses concepts like hyperplanes, dense-grids etc. Not all cluster-
are inclined towards solving this issue. In addition, the use ing techniques are fairly scalable for high dimensional cases
of user-given thresholds as rule interestingness measure is but most of them yield single as well as multi-dimensional
common throughout most of these techniques that ultimately QARs. Use of min-sup, min-conf thresholds is common but
drives the overall quantity and quality of mined rules. QAR some techniques require the users to specify other thresholds.
mining using partitioning approach also takes higher execution The rules generated by the techniques from this approach are
time and mostly yields single dimensional positive QARs. however only positive.
128
Apriori algorithm. Gyenesei [21] assigns each quantitative for mining a smaller set of +ve and -ve QARs with better
attribute with several fuzzy sets (instead of sharp intervals) comprehensibility and lower cost of computation.
that characterize the attribute. Fuzzy support, fuzzy confidence 1) Discussion: Evolutionary algorithms often perform well
and fuzzy correlation are used as interestingness measures. in finding approximate solutions to all types of problems
Importantly, this method may suffer from anomalies if fuzzy using mechanisms inspired by biological evolution. The GA
sets are not well chosen. In an another attempt [22] numerical approach has the specialty to mine both positive and negative
attributes are converted to fuzzy binary attributes and employs QAR without discretizing attributes at prior. Hence, it takes
efficient thread based mechanism for mining the quantitative lesser database scans and also yields rules with optimized sup-
rules quickly. Recently, Zheng et al. [23] proposed a generic port and confidence. Multi-objective evolutionary techniques
Optimized Fuzzy Association Rule Mining (OFARM) method are result of recent research that try to reduce the cost of
which is easy to extend for continuous data. It optimizes the mining and optimize the number of positive and negative
partition points for fuzzy sets, where a multiple objective QARs generated without compromising rule interestingness.
function is used. A two-level iterative method is used to
generate association-rules. In addition to MinSup & MinConf, F. Other Approaches
this method uses one of the newer effective measures for
evaluation of association rules, called the Certainty Factor. There are several approaches (other than those which are
1) Discussion: In classical fuzzy techniques, linguitic ter- classified above) that try to solve quantitative association rule
minologies for defining partitions was the idea but the modern mining in their own ways. Cheng et al. in [27] highlight that
ones, convert the quantitative data into fuzzy sets and partition another combinatorial explosion problem may get triggered
points are used to divide neighbouring fuzzy sets. One problem by the task of combining the adjacent intervals of a quan-
with classical fuzzy QARM techniques is that they do not opti- titative attribute to increase support and detect meaningful
mize the selection of partition points and use Extended Apriori intervals. To address this combinatorial explosion problem
which takes higher exec-time as dimensions increases. But, using information-theoretic approach, they introduce the MIC
the modern fuzzy techniques take care of it. The fuzzy QARs (Mutual Information and Clique) framework to inspect the
using newer techniques are strong, positive and comprise at mutual information (MI) between every attribute pairs and
most two dimensions within antecedents and one dimension establish a MI Graph representing attributes having strong
in the consequent. However, these techniques may even suffer informative relationship w.r.t. a pre-defined threshold. Thus,
from the Support-Confidence conflict if the thresholds are not each frequent item-set is represented by a Clique present in
wisely chosen. the MI Graph. In this method, the attribute sets and their
corresponding intervals to be combined (or joined) can be
reduced effectively. Li et al. [28] extends a grid-based QAR
E. Evolutionary Approach
mining method using meta-rules that store data tuples in linked
The QAR mining problem is also treated with evolutionary lists where QAR mining is executed. In meta-rule guided
algorithmic (EA) approach by different researches in due association rule mining, some syntactic or semantic constraints
course. The most popular type of EA is the genetic algorithm are specified in the form of rules.
(GA) that has found utilization in QAR mining. Recent Nemmiche and Guillaume [29] introduce another different
research on QAR mining witnesses the use of multi-objective method for mining optimal positive and negative QARs. They
evolutionary algorithms too. Alataş et al. [6] proposes a genetic state that irrespective of the fact whether an AR is +ve or -ve;
algorithmic strategy for both positive and negative QAR the use of support and confidence as rule interestingness mea-
mining. With the help of 1) adaptive mutation probability, sures is not sufficient for optimal QAR generation. Therefore,
2) uniform operator and 3) an efficient fitness function; their in their method they used tables to summarize the trend of
method is capable to mine QARs without taking any thresholds variable interactions, thereby highlighting the zones that are
and without data preparation. QuantMiner [24] is a GA based interesting. Moreover, the method also introduces a new rule
approach that follows a set of rule templates while mining semantic of the form influential(s) → Influenceable(s) and an
rules. A template is nothing but a preset format of a QAR. impact measure Influence that analyses variable behavior and
It may be selected by the user or may be system computed. guarantees that rules of higher potential will be discovered
QuantMiner finds reasonable intervals in ARs dynamically, by discarding the irrelevant ones.
optimizing the support count and rule-confidence. 1) Discussion: The above techniques are applied on QAR
The QARM research also encompases multi-objective evo- mining without having much influence from the general ap-
lutionary algorithm. The ARM process can be considered proaches. They have their own unique way of dealing with
as a multi-objective problem instead of a single objective QAR mining problem. Most of these techniques strive to mini-
in which the rule evaluation measures may have different mize the many-rules generation and irrelevant-rules generation
objectives [4]. Kaya et al. [25] proposes two novel methods problem. The metarule guided technique tries to find multi-
to optimize QARs. They use three important criteria; support, dimensional rules and is linearly scalable to database size
confidence and amplitude as thresholds. Recently, Martin et and dimensions. The next technique deploying impact measure
al. [26] proposes a new multi-objective evolutionary approach usage has capability to mine both positive and negative QAR
129
TABLE I
S UMMARY OF DIFFERENT QAR M INING A PPROACHES
Reduces combinatorial
Information- explosion of intervals Not much effective Multiple 3 7 3 Multiple 7
Theoretic [27] that appear in most in smaller databases
partitioning techniques
9% 26%
IV. S UMMARY
11%
Different QAR mining approaches discussed above is sum-
marized in Table I and the proportion of contributions w.r.t 21%
the approaches is shown in Fig 2. We can observe from
Table I that every QAR mining technique has its pros and
cons. Despite several differences in the context of dealing
quantitative data, the use of support and confidence as rule-
interestingness measures is common throughout most of the
approaches. The use of newer rule evaluation measures (as Partitioning 26% Clustering 21%
stated in [4] and [5]) are also found in several modern QARM Fuzzy 11% Statistical 9%
techniques. Evolutionary 15% Others 18%
Table I also highlights that there are a few approaches Fig. 2. Proportion of contributions in various QARM approaches.
dealing with negative association rules and no technique is
capable of mining QAR in single database scan. However,
techniques from all approaches other than those from partition- V. C ONCLUSION
ing are capable of mining rules of more than one dimension. In
addition, discretization or data preparation is another common QARM techniques are essential for discovering knowledge
step for all techniques except the few using EA. Information from the real-world databases that contain high volume of
loss due to discretization is prominent using partitioning. quantitative and categorical data. In recent decades, different
130
QARM techniques found applications in various domains [30]. Application (ISDEA), 2012 Second International Conference on. IEEE,
Therefore, a large number of research works are contributed 2012, pp. 44–47.
[11] B.-C. Chien, Z.-L. Lin, and T.-P. Hong, “An efficient clustering algorithm
towards efficiently solving the problem of QAR mining. In for mining fuzzy quantitative association rules,” in IFSA World Congress
this work, we highlight various trends in QARM research and and 20th NAFIPS International Conference, 2001. Joint 9th, vol. 3.
present a comprehensive study on various approaches with IEEE, 2001, pp. 1306–1311.
their relative merits and limitations. To conclude it can be [12] W. Lian, D. W. Cheung, and S. Yiu, “An efficient algorithm for finding
stated that from the broad variety of techniques in existence dense regions for mining quantitative association rules,” Computers &
Mathematics with Applications, vol. 50, no. 3, pp. 471–490, 2005.
no particular technique seems to be suitable for application [13] Y. Guo, J. Yang, and Y. Huang, “An effective algorithm for mining
in all domains because the appearance, size and nature of quantitative association rules based on high dimension cluster,” in
data belonging to different domains vary. Moreover, current Wireless Communications, Networking and Mobile Computing, 2008.
WiCOM’08. 4th International Conference on. IEEE, 2008, pp. 1–4.
solutions may be considered inadequate because there are [14] Y. Junrui and Z. Feng, “An effective algorithm for mining quantitative
several issues pertaining to the available QARM techniques associations based on subspace clustering,” in Networking and Digital
which we identify and enlist below: Society (ICNDS), 2010 2nd International Conference on, vol. 1. IEEE,
2010, pp. 175–178.
1) Inability to mine both Positive and Negative QARs. [15] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate
2) Failure to generate efficient multi-dimensional QARs. generation,” in ACM SIGMOD Record, vol. 29, no. 2. ACM, 2000, pp.
1–12.
3) Dependence on proper selection of thresholds for gen- [16] R. J. Miller and Y. Yang, “Association rules over interval data,” in ACM
erating rules. SIGMOD Record, vol. 26, no. 2. ACM, 1997, pp. 452–461.
4) Redundant, uninteresting and misleading rules genera- [17] Q. Tong, B. Yan, and Y. Zhou, “Mining quantitative association rules
on overlapped intervals,” in Advanced Data Mining and Applications.
tion. Springer, 2005, pp. 43–50.
5) Poor scalability of the mining technique w.r.t database [18] Y. Aumann and Y. Lindell, “A statistical theory for quantitative associ-
dimensions and volume. ation rules,” Journal of Intelligent Information Systems, vol. 20, no. 3,
pp. 255–283, 2003.
6) Large database scans to generate frequent itemsets and
[19] G.-M. Kang, Y.-S. Moon, H.-Y. Choi, and J. Kim, “Bipartition tech-
mine rules. niques for quantitative attributes in association rule mining,” in TENCON
7) High computational cost or execution time. 2009-2009 IEEE Region 10 Conference. IEEE, 2009, pp. 1–6.
[20] W. Zhang, “Mining fuzzy quantitative association rules,” in 2012 IEEE
Further works are required to address these issues and thus 24th International Conference on Tools with Artificial Intelligence.
bring in novel scopes for future researches in the QARM IEEE Computer Society, 1999, pp. 99–99.
scenario. [21] A. Gyenesei, “A fuzzy approach for mining quantitative association
rules.” Acta Cybern., vol. 15, no. 2, pp. 305–320, 2001.
[22] S. Prakash and R. Parvathi, “Qualitative approach for quantitative
R EFERENCES association rule mining using fuzzy rule set,” Journal of Computational
Information Systems, vol. 7, no. 6, pp. 1879–1885, 2011.
[1] R. Srikant and R. Agrawal, “Mining quantitative association rules in [23] H. Zheng, J. He, G. Huang, and Y. Zhang, “Optimized fuzzy association
large relational tables,” in ACM SIGMOD Record, vol. 25, no. 2. ACM, rule mining for quantitative data,” in Fuzzy Systems (FUZZ-IEEE), 2014
1996, pp. 1–12. IEEE International Conference on. IEEE, 2014, pp. 396–403.
[2] R. Agrawal, T. Imieliński, and A. Swami, “Mining association rules [24] A. Salleb-Aouissi, C. Vrain, and C. Nortet, “Quantminer: A genetic
between sets of items in large databases,” in ACM SIGMOD Record, algorithm for mining quantitative association rules.” in IJCAI, vol. 7,
vol. 22, no. 2. ACM, 1993, pp. 207–216. 2007.
[3] J. Han, M. Kamber, and J. Pei, Data mining, southeast asia edition:
[25] M. Kaya and R. Alhajj, “Novel approach to optimize quantitative
Concepts and techniques. Morgan kaufmann, 2006.
association rules by employing multi-objective genetic algorithm,” in
[4] M. Martı́nez-Ballesteros and J. Riquelme, “Analysis of measures of Innovations in Applied Artificial Intelligence. Springer, 2005, pp. 560–
quantitative association rules,” in Hybrid Artificial Intelligent Systems. 562.
Springer, 2011, pp. 319–326.
[26] D. Martin, A. Rosete, J. Alcalá-Fdez, and F. Herrera, “A new multiob-
[5] M. Martı́nez-Ballesteros, F. Martı́nez-Álvarez, A. Troncoso, and J. C.
jective evolutionary algorithm for mining a reduced set of interesting
Riquelme, “Selecting the best measures to discover quantitative associ-
positive and negative quantitative association rules,” Evolutionary Com-
ation rules,” Neurocomputing, vol. 126, pp. 3–14, 2014.
putation, IEEE Transactions on, vol. 18, no. 1, pp. 54–69, 2014.
[6] B. Alataş and E. Akin, “An efficient genetic algorithm for automated
[27] Y. Ke, J. Cheng, and W. Ng, “Mic framework: an information-theoretic
mining of both positive and negative quantitative association rules,” Soft
approach to quantitative association rule mining,” in Data Engineering,
Computing, vol. 10, no. 3, pp. 230–237, 2006.
2006. ICDE’06. Proceedings of the 22nd International Conference on.
[7] K. C. Chan and W.-H. Au, “An effective algorithm for mining inter- IEEE, 2006, pp. 112–112.
esting quantitative association rules,” in Proceedings of the 1997 ACM
[28] J. Li and X. Ye, “Study on linked list-based algorithm for metarule-
symposium on Applied computing. ACM, 1997, pp. 88–90.
guided mining of multidimensional quantitative association rules,” in
[8] T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, “Mining
Natural Computation, 2007. ICNC 2007. Third International Conference
optimized association rules for numeric attributes,” in Proceedings of
on, vol. 1. IEEE, 2007, pp. 300–304.
the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles
[29] L. N. Alachaher and S. Guillaume, “Variables interaction for mining
of database systems. ACM, 1996, pp. 182–191.
negative and positive quantitative association rules.” in ICTAI, 2006, pp.
[9] J. Li, H. Shen, and R. Topor, “An adaptive method of numerical
82–85.
attribute merging for quantitative association rule mining,” in Internet
[30] D. Adhikary and S. Roy, “Mining quantitative association rules in real-
Applications. Springer, 1999, pp. 41–50.
world databases: A review,” in Computing and Communication Systems
[10] L. Dancheng, Z. Ming, Z. Shuangshuang, and Z. Chen, “A new ap-
(I3CS), 2015 1st International Conference on, vol. 1. IGI Global, 2015,
proach of self-adaptive discretization to enhance the apriori quantitative
pp. 87–92.
association rule mining,” in Intelligent System Design and Engineering
131