High Utility Mining

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/45602076

Overview of Itemset Utility Mining and its Applications

Article  in  International Journal of Computer Applications · August 2010


DOI: 10.5120/956-1333 · Source: DOAJ

CITATIONS READS

25 1,594

2 authors:

Jyothi Pillai O. P. Vyas


Bhilai Institute of Technology Indian Institute of Information Technology Allahabad
12 PUBLICATIONS   105 CITATIONS    114 PUBLICATIONS   698 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Authentication, Authorisation, PKI Management, Access Controls View project

#SmartME View project

All content following this page was uploaded by O. P. Vyas on 11 January 2014.

The user has requested enhancement of the downloaded file.


International Journal of Computer Applications (0975 – 8887)
Volume 5– No.11, August 2010

Overview of Itemset Utility Mining and its


Applications
Jyothi Pillai O.P.Vyas
Reader Professor
Bhilai institute of Technology, Indian Institute of Information Technology
Durg-491001, Chhattisgarh, India Allahabad, U.P., India

ABSTRACT The goal of data mining is to extract higher-level


An emerging topic in the field of data mining is Utility hidden information from an abundance of raw data[3].
Mining. The main objective of Utility Mining is to Data mining has been used in various data domains.
identify the itemsets with highest utilities, by Data mining can be regarded as an algorithmic process
considering profit, quantity, cost or other user that takes data as input and yields patterns, such as
preferences. Mining High Utility itemsets from a classification rules, itemsets, association rules, or
transaction database is to find itemsets that have utility summaries, as output [11].
above a user-specified threshold. Itemset Utility Data Mining tasks can be classified into two
Mining is an extension of Frequent Itemset mining, categories, Descriptive Mining and Predictive Mining.
which discovers itemsets that occur frequently. In The Descriptive Mining techniques such as Clustering,
many real-life applications, high-utility itemsets Association Rule Discovery, Sequential Pattern
consist of rare items. Rare itemsets provide useful Discovery, is used to find human-interpretable
information in different decision-making domains patterns that describe the data. The Predictive Mining
such as business transactions, medical, security, techniques like Classification, Regression, Deviation
fraudulent transactions, retail communities. For Detection, use some variables to predict unknown or
example, in a supermarket, customers purchase future values of other variables
microwave ovens or frying pans rarely as compared to 1.2 Association rule Mining
bread, washing powder, soap. But the former Mining Association rules is one of the research
transactions yield more profit for the supermarket. problems in data mining [1]. Given a set of
Similarly, the high-profit rare itemsets are found to be transactions where each transaction is a set of items,
very useful in many application areas. For example, in an association rule is an expression of the form X
medical application, the rare combination of Y, where X and Y are sets of items.
symptoms can provide useful insights for doctors [21]. The problem of mining association rules was first
A retail business may be interested in identifying its introduced in [1] and later broadened in [2], for the
most valuable customers i.e. who contribute a major case of databases consisting of categorical attributes
fraction of overall company profit[10]. Several alone.
researches about itemset utility mining were proposed. Association rule mining (ARM) is a popular
In this paper, a literature survey of various algorithms technique for finding co-occurrences, correlations,
for high utility rare itemset mining has been presented. frequent patterns, associations among items in a set of
transactions or a database. Rules with confidence and
Keywords: Utility Mining, High-utility itemsets, Rare support above user-defined thresholds (minconf and
itemsets, Frequent Itemset mining minsup) were found. As data continues to grow and its
complexity increases, newer data structures and
1. Introduction algorithms are being developed to match this
1.1 Data Mining development. Association Rule Mining process can be
During the last ten years, Data mining, also known as divided into two steps. The first step involves finding
knowledge discovery in databases has established its all frequent itemsets (or say large itemsets) in
position as a prominent and important research area. databases. Once the frequent itemsets are found,

9
International Journal of Computer Applications (0975 – 8887)
Volume 5– No.11, August 2010

association rules are generated [6]. ARM is widely 1.5 Rare Itemset Mining
used in market-basket analysis. For example, frequent The basic Bottleneck of association rule mining is
itemsets can be found out by analyzing market basket Rare Item Problem. Most approaches to mining
data and then association rules can be generated by association rules implicitly consider the utilities of the
predicting the purchase of other items by conditional itemsets to be equal [6]. The utilities of itemsets may
probability [1], [2]. differ. In many applications, some items appear very
frequently in the data, while others rarely appear. If
1.3 Utility Mining frequencies of items vary, two problems encountered –
The traditional ARM approaches consider the utility (1) If minsup is set too high, then rules of rare items
of the items by its presence in the transaction set. The will not be found (2) To find rules that involve both
frequency of itemset is not sufficient to reflect the frequent and rare items, minsup has to be set very low.
actual utility of an itemset. For example, the sales This may cause combinatorial explosion.
manager may not be interested in frequent itemsets In many practical situations, the rare combinations of
that do not generate significant profit. items in the itemset with high utilities provide very
Recently, one of the most challenging data mining useful insights to the user. Rare itemsets are the
tasks is the mining of high utility itemsets efficiently. itemsets that occur infrequently in the transaction data
Identification of the itemsets with high utilities is set. In most business applications, frequent itemsets
called as Utility Mining. The utility can be measured may not generate much profit while rare itemsets may
in terms of cost, profit or other expressions of user generate a very high profit. Rare itemsets are very
preferences. For example, a computer system may be important and can be further promoted together
more profitable than a telephone in terms of profit. because they possess high associations and can bring
Utility mining model was proposed in [19] to define some acceptable profits [21].
the utility of itemset. The utility is a measure of how Rare itemsets provide very useful information in the
useful or profitable an itemset X is. The utility of an real-life applications such as security, business
itemset X, i.e., u(X), is the sum of the utilities of strategies, biology, medicine and super market shelf-
itemset X in all the transactions containing X. An management. For example [16], in the security field,
itemset X is called a high utility itemset if and only if normal behavior is very frequent, whereas abnormal or
u(X) >= min_utility, where min_utility is a user- suspicious behavior is less frequent. Considering a
defined minimum utility threshold [11]. database where the behavior of people in sensitive
The main objective of high-utility itemset mining is to places such as airports are recorded, if those behaviors
find all those itemsets having utility greater or equal to are modeled, it is likely to find that normal behaviors
user-defined minimum utility threshold. can be represented by frequent patterns and suspicious
1.4 Frequent Itemset Mining behaviors by rare patterns.
R. Agrawal et al in [1] introduced the concept of In this paper we have presented a literature survey of
frequent itemset mining. Frequent itemsets are the the various approaches and algorithms for high-utility
itemsets that occur frequently in the transaction data mining and rare itemset mining.
set. The goal of Frequent Itemset Mining is to
identify all the frequent itemsets in a transaction 2. Literature survey
dataset. In the previous section we have introduced the basic
Let I = {i1, i2, …, in} be a set of n distinct literals concept of Data Mining, Association Rule mining,
called items. An itemset is a non-empty set of items. Utility Mining and Rare Itemset Mining. A brief
An itemset X = (i1, i2, …, ik) with k items is referred overview of various algorithms, concepts and
to as k-itemset, A transaction T = <TID, (i1, i2, …, techniques defined in different research papers have
ik)> consists of a transaction identifier (TID) and a set been given in this section.
of items (i1, i2, …, ik), where ij Î I, j = 1, 2, …, k. The mining of association rules for finding the
The frequency of an itemset X is the probability of X relationship between data items in large databases is a
occurring in a transaction T. A frequent itemset is the well studied technique in data mining field with
itemset having frequency support greater a minimum representative methods like Apriori [1], [2]. ARM
user specified threshold. process can be decomposed into two steps. The first
step involves finding all frequent itemsets in

10
International Journal of Computer Applications (0975 – 8887)
Volume 5– No.11, August 2010

databases. The second step involves generating were developed to improve the efficiency of mining
association rules from frequent itemsets. high utility itemsets [11].
In [6], Yao et al defined the problem of utility mining, Liu et al proposed Two-Phase algorithm [8] for
a theoretical model called MEU, which finds all finding high utility itemsets. In the first phase, a model
itemsets in a transaction database with utility values that applies the “transaction-weighted downward
higher than the minimum utility threshold. The closure property” on the search space to expedite the
mathematical model of utility mining was defined identification of candidates. In the second phase, one
based on utility bound property and the support bound extra database scan is performed to identify the high
property. This laid the foundation for future utility utility itemsets.
mining algorithms. A novel method, namely THUI (Temporal High
H. Yao et al formalized the semantic significance of Utility Itemsets) –Mine was proposed by V.S. Tseng
utility measures in [11]. Based on the semantics of et al in [10], for mining temporal high utility itemsets
applications, the utility-based measures were classified from data streams efficiently and effectively. The
into three categories, namely, item level, transaction novel contribution of THUI-Mine is that it can
level, and cell level. The unified utility function was effectively identify the temporal high utility itemsets
defined to represent all existing utility-based by generating fewer temporal high transaction-
measures. The transaction utility and the external weighted utilization 2-itemsets such that the execution
utility of an itemset was defined and general unified time can be reduced substantially in mining all high
framework was developed to define a unifying view of utility itemsets in data streams. In this way, the
the utility based measures for itemset mining. The process of discovering all temporal high utility
mathematical properties of the utility based measures itemsets under all time windows of data streams can
were identified and analyzed. be achieved effectively with limited memory space,
High utility frequent itemsets contribute the most to a less candidate itemsets and CPU I/O time. This meets
predefined utility, objective function or performance the critical requirements on time and space efficiency
metric[13]. For example, From marketing strategy for mining data streams. The experimental results
perspective, it is important to identify product show that THUI-Mine can discover the temporal high
combinations that have a significant impact on utility itemsets with higher performance and less
company’s bottom line i.e. having highest revenue candidate itemsets compared to other algorithms under
generating power[13]. various experimental conditions. Moreover, it
An algorithm for frequent item set mining was performs scalable in terms of execution time under
presented by J. Hu et al in [13] that identify high- large databases. Hence, THUI-Mine is promising for
utility item combinations. In contrast to the traditional mining temporal high utility itemsets in data streams.
association rule and frequent item mining techniques, G.C.Lan et al proposed a new kind of patterns, named
the goal of the algorithm is to find segments of data, Rare Utility Itemsets in [21], which consider not only
defined through combinations of few items (rules), individual profits and quantities but also common
which satisfy certain conditions as a group and existing periods and branches of items in a multi-
maximize a predefined objective function. The database environment. A new mining approach called
highutility pattern mining problem considered is TP-RUI-MD (Two-Phase Algorithm for Mining
different from former approaches, as it conducts “rule- Rare Utility Itemsets in Multiple Databases) was
discovery” with respect to individual attributes as well proposed to efficiently discover rare utility itemsets.
as with respect to the overall criterion for the mined The TP-RUI-MD algorithm is designed to find the
set, attempting to find groups of such patterns that rare-utility itemsets in a multi-database environment.
combined contribute the most to a predefined The TP-RUI-MD algorithm uses the level-wise
objective function [13]. technique for discovering the rare-utility itemsets in a
In the paper [17], H.F. Li proposed two efficient one- multi-database environment. The TP-RUI-MD
pass algorithms, MHUI-BIT and MHUI-TID, for algorithm cannot discover all rare-utility itemsets in a
mining high utility itemsets from data streams within a multi-database environment, but the TP-RUI-MD
transaction-sensitive sliding window. Two effective algorithm can still discover these rare-utility
representations of item information and an extended undiscovered by existing algorithms in a single
lexicographical tree-based summary data structure database environment.

11
International Journal of Computer Applications (0975 – 8887)
Volume 5– No.11, August 2010

S. Shankar et al, presents a novel algorithm Fast Apriori Rare itemset to mine rare itemsets was
Utility Mining (FUM) in [19], which finds all high proposed which performs a level-wise descent. The
utility itemsets within the given utility constraint backward traversal method is endowed with a property
threshold. The authors also suggest a novel method of that leads to prune out potentially non-rare itemsets in
generating different types of itemsets such as High the mining process. This includes an anti-monotone
Utility and High Frequency itemsets (HUHF), High property and a level wise exploration of the itemset
Utility and Low Frequency itemsets (HULF), Low space.
Utility and High Frequency itemsets (LUHF) and Low The authors in [22] presents a new foundational
Utility and Low Frequency itemsets (LULF) using a approach to temporal weighted itemset utility mining
combination of FUM and Fast Utility Frequent mining where item utility values are allowed to be dynamic
(FUFM) algorithms. within a specified period of time, unlike traditional
In paper [14], L. Szathmary et al presented a novel approaches where these values are static within those
method for computing all rare itemsets by splitting the times. Moreover, the approach incorporates a fuzzy
rare itemset mining task into two steps. The first step model where utilities can assume fuzzy values on the
is the identification of the minimal rare itemsets. other hand [22]. A Conceptual model has been
These itemsets jointly act as a minimal generation presented that allows development of an efficient and
seed for the entire rare itemset family. In the second applicable algorithm to real world data and captures
step, the minimal rare itemsets are processed in order real-life situations in fuzzy temporal weighted utility
to restore all rare itemsets. Two algorithms were association rule mining[22].
proposed for the first step: (i) a naive one that relies on
an Apriori-style enumeration, Apriori-Rare and (ii) an 4. Conclusion
optimized method that limits the exploration to In Data Mining, Association Rule Mining is one of the
frequent generators only. The second task is solved by most important tasks. A large number of efficient
a straightforward procedure. Apriori-rare is a algorithms are available for association rule mining,
modification of the Apriori algorithm used to mine which considers mining of frequent itemsets. But an
frequent itemsets. Apriori-rare generates a set of all emerging topic in Data Mining is Utility Mining,
minimal rare generators, also called MRM, that which incorporates utility considerations during
correspond to the itemsets usually pruned by the itemset mining. Utility Mining covers all aspects of
Apriori algorithm when seeking for frequent itemsets. economic utility in data mining and helps in detection
To retrieve all rare itemsets from minimal rare itemset of rare itemset having high utility. Rare High Utility
(mRIs), a prototype algorithm called “A Rare Itemset itemset mining is very beneficial in several real-life
Miner Algorithm(Arima)” was proposed. Arima applications. In this paper, we have presented a brief
generates the set of all rare itemsets, splits into two overview of various algorithms for high utility rare
sets: the set of rare itemsets having a zero support and itemset mining. In the future scope, we will be
the set of rare itemsets with non-zero support. If an presenting a comparative study of various algorithms
itemset is rare then any extension of that itemset will for mining rare high utility itemset.
result a rare itemset.
M. Adda et al proposed a framework in [16], to
represent different categories of interesting patterns 5. References
and then instantiate it to the specific case of rare [1] R. Agrawal, T. Imielinski and A. Swami, 1993,
patterns. A generic framework was presented to mine “Mining association rules between sets of items
patterns based on the Apriori approach. The in large databases”, in Proceedings of the ACM
generalized Apriori framework was instantiated to SIGMOD International Conference on
mine rare itemsets. The resulting approach is Apriori- Management of data, pp 207-216.
like and the mine idea behind it is that if the itemset [2] R. Agrawal and R. Srikant, 1994, “Fast
lattice representing the itemset space in classical Algorithms for Mining Association Rules”, in
Apriori approaches is traversed on a bottom-up Proceedings of the 20th International Conference
manner, equivalent properties to the Apriori Very Large Databases, pp. 487-499.
exploration of frequent itemsets are provided to mine [3]Attila Gyenesei, “Mining Weighted Association
rare itemsets. The Apriori algorithm, called AfRIM for Rules for Fuzzy Quantitative Items”, Lecture

12
International Journal of Computer Applications (0975 – 8887)
Volume 5– No.11, August 2010

notes in Computer Science, Springer, Vol. [13] J. Hu, A. Mojsilovic, “High-utility pattern
1910/2000, pages 187-219, TUCS Technical mining: A method for discovery of high-utility
Report No.346, ISBN 952-12-659-4,ISSN 1239- item sets”, Pattern Recognition 40 (2007) 3317 –
1891, May 2000. 3324.
[4] R. Chan, Q. Yang, Y. D. Shen, “Mining High [14] L. Szathmary, A. Napoli, P. Valtchev, “Towards
utility Itemsets”, In Proc. of the 3rd IEEE Intel. Rare Itemset Mining” Proceedings of the 19th
Conf. on Data Mining (ICDM), 2003. IEEE International Conference on Tools with
[5] H. Yun, D. Ha, B. Hwang, and K. Ryu. “Mining Artificial Intelligence, 2007, Volume 1, Pages:
association rules on significant rare data using 305-312, ISBN ~ ISSN:1082-3409 , 0-7695-
relative support”. Journal of Systems and 3015-X
Software, 67(3):181–191, 2003. [15] Kriegel, H-P et al. 2007. “Future Trends in Data
[6] H.Yao, H. J. Hamilton, and C. J. Butz, “A Mining, Data Mining and Knowledge
Foundational Approach to Mining Itemset Discovery”, 15:87–97.
Utilities from Databases”, Proceedings of the [16] M. Adda, L. Wu, Y. Feng, “Rare Itemset
Third SIAM International Conference on Data Mining”, Sixth International conference on
Mining, Orlando, Florida, pp. 482-486, 2004. Machine Learning and Applications, 2007, pp 73-
[7] G. Weiss. “Mining with rarity: a unifying 80.
framework”,.SIGKDD Explor. Newsl., 6(1):7– [17] H.F. Li, H.Y. Huang, Y.Cheng Chen, and Y. Liu
19, 2004. and S. Lee, “Fast and Memory Efficient
[8] Liu, Y., Liao, W., and A. Choudhary, A., “A Fast Mining of High Utility Itemsets in Data
High Utility Itemsets Mining Algorithm”, In Streams”, 2008 Eighth IEEE International
Proceedings of the Utility- Based Data Mining Conference on Data Mining.
Workshop, August 2005. [18] M. Sulaiman Khan, M. Muyeba, Frans Coenen,
[9] Lu, S., Hu, H. and Li, F. 2005. “Mining weighted 2008. “Fuzzy Weighted Association Rule Mining
association rules. Intelligent Data Analysis”, with Weighted Support and Confidence
5(3):211–225. Framework”, to appear in ALSIP (PAKDD),pp.
[10]V. S. Tseng, C.J. Chu, T. Liang, “Efficient 52-64.
Mining of Temporal High Utility Itemsets from [19]S. Shankar, T.P.Purusothoman, S.Jayanthi and
Data streams”, Proceedings of Second N.Babu, “A Fast Algorithm for Mining High
International Workshop on Utility-Based Data Utility Itemsets”, Proceedings of IEEE
Mining, August 20, 2006 International Advance Computing Conference
[11]H. Yao, H. Hamilton and L. Geng, “A Unified (IACC 2009), Patiala, India, pages : 1459 - 1464
Framework for Utilty-Based Measures for [20] Hu, J., Mojsilovic, A. “High-utility Pattern
Mining Itemsets”, In Proc. of the ACM Intel. Mining: A Method for Discovery of High-
Conf. on Utility-Based Data Mining Workshop utility Item” Sets, Pattern Recognition, Vol. 40,
(UBDM), pp. 28-37, 2006. 3317-3324.
[12]A. Erwin, R.P.Gopalan and N. R. Achuthan, 2007, [21] G.C.Lan, T.P.Hong and V.S. Tseng, “A Novel
“A Bottom-up Projection based Algorithm for Algorithm for Mining Rare-Utility Itemsets in
mining high utility itemsets”, in Proceedings of a Multi-Database Environment”
2nd Workshop on integrating AI and Data [22] J. Pillai, O.P. Vyas, S. SoniM. Muyeba “A
Mining(AIDM 2007)”, Australia, Conferences in Conceptual Approach to Temporal Weighted
Research and Practice in Information Itemset Utility Mining”, 2010 International
Technolofy(CRPIT),Vol. 84. Journal of Computer Applications (0975 - 8887)
Volume 1 – No. 28

13

View publication stats

You might also like