Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Efficient Apriori Algorithm using Enhanced

Transaction Reduction Approach


Jeanie R. Delos Arcos Alexander A. Hernandez
Graduate Programs College of Information Technology Education
Technological Institute of the Philippines Technological Institute of the Philippines
Quezon City, Philippines Manila, Philippines
jeanie.delosarcos@gmail.com alexander.hernandez@tip.edu.ph

Abstract—Apriori algorithm has fundamentally identified known algorithms under ARM are Apriori, Eclat, and FP-
algorithm in association rule mining. The principle key concept Growth applied to extract frequent itemsets [10].
of this algorithm is to discover interesting recurrent patterns
between different groups of data. It is a straight forward, and Apriori Algorithm (AA) recognized as a level-wise
an original algorithm, which employs a high iterative approach algorithm designed to perform and search design to
viewed as level-wise search. Nevertheless, this kind of commend frequent itemsets [11]. This algorithm also earned
algorithm has loads of downsides. Source on this algorithm, popularity and a commonly used algorithm for ARM [12].
this research denotes the constraint of the classical Apriori However, in the current trends, the datasets collated from
algorithm in terms of computational cost in mapping the entire transactions have become gigantic, comparing its
database for discovering frequent itemsets and signifies an accumulated data sets 10’years ago [13]. The immense data
enhancement of Apriori by minimizing the algorithm creates the Apriori Algorithm a dilemma involving large
generation cost through enhancing transaction reduction datasets. Initially, it expects successive mapping of the
approach. The enhanced Apriori when compared with the transactional database, agitating a computational generation
Apriori algorithm, the length of database scanning time cost [14]. Also, it propagates innumerable candidate sets
reduces to 58 percent, and the generation time reduces consumed by a considerable amount of memory resource
approximately 89 percent of the original Apriori. [15].
Keywords—Association rule mining, apriori algorithm, frequent Thus, a need to enhance the Apriori algorithm keeps on
item set mining, transaction reduction, hashing. attracting researchers for years using different techniques.
Numerous studies conducted and most modern approach
I. INTRODUCTION have been recommended to cope with this dilemma, each
with its advantages and drawbacks, but no ultimate strategy
Nearly all well-established businesses aggregated a attained for years [6]. For more significant application in this
multitude of decades of data collated from customers’ element, new algorithms that can enhance further the
information[1]. In the advent of e-commerce applications scalability and interpretability are nonetheless in high
proliferation, companies significantly accumulate data in demand [16]. This led to the main contribution of the study
months, not in years[2]. In the last ten years, Data Mining that gears on improving Apriori Algorithm to improve the
domain is otherwise known as knowledge Discovery, efficiency of the algorithm by minimizing the number of
designs and framework crafted from the performance database scans and limiting the input and output cost.
information of social insects datasets like ants constitute The structure of this paper is arranged in a manner
Knowledge Discovery in Databases (KDD), which focuses defined as follows. Section II discusses the Association Rule
on finding correlations, trends, correlations, patterns, Mining and related works. Section III gives an overview of
anomalies that gained promising popularity[3]. Comparison the design and process of Enhanced Apriori Algorithm.
to robust essential multi-databases help company’s makes Section IV presents the experimental results of the
accurate future decisions. Most widely used data mining comparative analysis of enhanced Apriori and the classic
applications in the understanding of item set associativity is Apriori. Section V concludes this paper. Finally,
association rule mining [4]. acknowledgment expresses gratitude to all who contributed
Association rule mining (ARM) gained a remarkably to the success of this work.
extensive and adequately researched techniques in data
mining and fundamentally popularized by Agrawal et al. in II. RELATED WORKS
1993 [5]. The purpose of the technique intent to fragment
frequent itemsets, compelling rules, associations, or rare data A. Association Rule Mining
organization among sets of items in a data repositories or
In knowledge discovery domain, Association Rule
other transaction databases. Market basket analysis serves as
Mining determines the relationship or the association rule
an example of ARM [6]. ARM processes determine
association rules that return the weighted minimum support between the data. The association rule expression is
and confidence from a defined transaction data [7]. Aside characterized as M form, where M is the antecedent
from market basket analysis, ARM is immensely noticeable and N is the consequent. The expression shows that the
in a diverse field like web search, process mining, medical number of times N occurred if M transpired base on the
assimilation, marketing advancement, and productively on support and confidence set in every process. Countless
the market dispensing [8]. Over time, several algorithms for algorithms in producing association rules were designed
generating association rules are introduced [9]. Popularly over time [17]. Few leading algorithms are Apriori known
as level-wise search, and FP-Growth. The issue of ARM

978-1-5386-6940-2/18/$31.00 ©2018 IEEE


expressed as: considering a transaction dataset, a support based on Apriori as a basic search strategy helped to modify
threshold and a confidence the whole series of procedures and data structures. Apriori
threshold that process every relationships with algorithm is user-friendly to implement and easy to
support higher than or equal to with confidence understand, making it popular to use to mine all frequent
higher than or equal to from the dataset. itemsets in a transaction database.
Association Rules looks for rules where M and N can be The algorithm generates multiple scanning in a database
particular items, composed of two item sets: to discover itemsets frequently in occurrences where f-
1. The left-hand side also identified as antecedent itemsets are accustomed to propagate i+1-itemsets. Every
2. The right-hand side also identified as consequent instance of items as i-itemset needs to be higher than or equal
to a nominal support threshold to become frequent.
The two significant primary interestingness measures Contrarily, the item will be identified as candidate itemsets.
for Association Rule or the relationships of items are Initially, the algorithm scan databases to learn the frequency
support(S) and confidence(C). The relationship of M⇒ N of 1-itemsets consisting only of one item in the process of
characterize in the records set Z with Support (S), defined as enumerating each item in the database. The rate of
the percentage of records in Z defining M∪N. This occurrences of 1-itemsets is used to determine the itemset in
2-itemsets, which often used to find 3-itemsets and so forth
important point leads to the definition of the likelihood of, P
until there are no additional f-itemsets. If an itemset is
(M∪ N). While, Confidence(C) in the records set Z if C is undoubtedly not frequent, any significant subset from it is
the percentage of records in Z comprising X that also also non-frequent; this circumstance is susceptible to search
comprise Y. Thus, defines the conditional likelihood, space in a database [21]. This algorithm follows three steps:
P(M|N), which further explains, Support (M⇒ N)=P(M∪ N)
Confidence (M⇒N) = P(M|N). Common terms used in ALGORITHM 1: APRIORI ALGORITHM
defining association rule mining listed as [18]: 1. F1 = {frequent 1-itemsets };
2. for (f=2; Ff-1≠ 0 ;f++) {
1. Itemset which describe a group of items. A k- 3. Cf = aprori_gen (Ff-1)
itemset can be an itemset which includes k 4. for all transactions t ϵ T {
occurrences of items. 5. Subset Cf, t
2. Frequent itemset that refers to an itemset which has 6. }
a nominal support. 7. F= {c ϵ C f | c.count ≥ minsup }
3. Candidate set that describe a label directed at a 8. return ∪ Ff
group of itemsets that need testing to determine if
In this algorithm, most of its time is spent in scanning
the sets match a particular requirement [19].
the database until it output into one frequent itemset. In this
4. Strong rule and Weak rule that describe a
work, the model designed base on the probability of its
description of the support rule if it is greater or efficiency measured with two qualities:
equal than nominal support and if the confidence
rule is higher or equal than nominal confidence 1. The probability of the set is a success referred as
then the relationship would be mark as strong rule, success rate
otherwise identified as a weak rule. 2. The probability of the set is a failure referred as
failure rate.
In general, association rules mining algorithm entails
the enumerated steps: C. Enhancements of Apriori Algorithm
In 2017, an implementation of a tree based algorithmic
1. Candidate set–itemsets f-itemsets is a result from approach successfully finds frequent items sets that geared
1-extensions of the sizeable (f-1)-itemsets towards a lesser table to generates in order to minimize the
produced in the earlier iteration. amount of space needed consumed by tables size. The
2. Support S for the candidate f-itemsets are resulted research reduces the time complexity of accessing the
by thorough scanning of the database. database through the utilization of tree base approach [22].
3. Itemsets with support that is less than nominal In 2017, an implementation of a tree-based algorithmic
support are reduced and the outstanding itemsets approach successfully found frequent items sets that geared
formed the gigantic f-itemsets. towards a lesser table to generates in order to minimize the
All of the recognized association rules can be broken amount of space needed consumed by tables size. The
down into two relative factors [5]: research reduces the time complexity of accessing the
database through the utilization of a tree-based approach
1. Finds the frequent item sets. [22].
2. Frequent itemsets are accustomed to propagate the
Furthermore, in 2018 researchers on Apriori
desired rules[20]. enhancement algorithm, the study of an improved Apriori
B. Apriori Algorithm algorithm using Bi-Phase algorithm utilizes an ensemble
algorithm that can find items sets using a high utility to the
Exploring frequent itemsets is among the many
user while also implicitly making use of the metrics of
investigated domains of data mining. Association rule and
minimum support and minimum confidence recommended
frequent itemset mining developed into the enormously
by traditional Association rule mining algorithm [24]. The
researched area, and therefore more efficient algorithms,
study suggests a bi-phase algorithm that recognizes high
modified or enhance have been presented. Distinct is Apriori
utility itemsets more effective. However, it scrapes out the
based algorithms or Apriori improvements. Algorithms
generation of rules with a high utility that commonly
integrates the traditional concepts of minimum support and (1)
confidence. where, be the cost of time incurred in a single scan
D. Hash Based Apriori Algorithm from a transaction database, be the cost of time of
In 2016, a study on a hash-based technique of size eight producing Ck+1 from Lk, mk defines the number of item sets
table that highlights on 2-itemsets mining, wherein the in Ck without nk, the variable lk+1 refers to the item sets in
evaluation of the modified algorithm geared on 2-itemsets Ck+1 while nk defines the number of item sets in Lk. A
mining memory consumption [25]. expresses the number of records in the transactional database
and n expresses the length of data.
In 2018, a study proposed an array hash table base
Apriori algorithm that maps an array table through one- The full total running period of the proposed improved
dimensional array structure [26]. The comparison of time version of Apriori algorithm is
taken over several client requests. Also, a dynamic hash • Time needed to search the transaction database on
algorithm [27] based on data attributes, improved association initial run time is: ts * mk where ts defines the time
rules mining. needed for searching the transaction database and
E. Transaction Reduction Apriori Algorithm mk defines the number of item instances in the
An improved Apriori algorithm using transaction transaction database
reduction approached successfully reduces the searching • Time needed to search for items with examination
time through scaling down the unnecessary transaction of support count: tk * nk where nk describe the
items which at the same time lessen the duplicate items amount of items and tk is the period consumed in
during the candidate itemsets generation, which can create searching for items compared with nominal support
the recurrent itemsets directly and discard candidate itemset • Time appropriated in scanning for items with
with a subset categorized as not frequent [28]. A similar maximum length : tl * nl where tl describe the
concept was also applied in the study of cache database reduced transaction database search time
transaction reduction [29], yet the study did not show a • Time appropriated to produced association rules :
performance result evaluation of the algorithm.
Where p is the length of maximum pattern, ti is the
III. PROPOSED ARCHITECHTURE AND METHODS period consumed in searching a relationship of length i and
ni where ni denotes the number of relationships of length i.
A. Proposed Enhanced Apriori Architecture
The total time appropriated for the prospective method
In the proposed enhanced Apriori Algorithm, there are
defined as follows in equation (2)
four modules added from the original Apriori Algorithm.
These modules are Parse Token, Map Token, Hash Frequent
Item Set, and Non-Frequent Item Set Reduction. (2)
Transaction reduction approach was enhanced through the
integration of the memorization approach [30] implemented IV. EXPERIMENTAL RESULTS
through the use of hashing. Fig. 1 shows the integration of The enhanced algorithm is implemented using Java
these modules in the original Apriori as an enhancement of programming language. The entire experiments are all
the algorithm. performed on an Intel(R) core i5 - 8250u CPU @ 1.60 GHz
with an 8 GB RAM, in a Microsoft Windows 10 running
operating system.
In the efficiency comparability, this analysis makes
certain of the comparability of the system state in all
evaluation functions and offers related outcome when
repeated. Several datasets utilized in the efficiency analysis
of this proposed function.

A. Properties of Datasets
The proposed approach is analyzed on four datasets to be
able to gauge the levels of performance on the datasets
having distinct attributes. Four original datasets (D1 to D4)
are found in the experimentations. Table 1 features the
details defining the properties of the datasets, the quantity of
Fig. 1. Enhanced Apriori Algorithm Architecture the transaction, and how large is a dataset. The four genuine
and distinct dataset (D1 to D4) is extracted from the Kaggle
B. Performance Measures repository [32]. Table 1 also displays the various properties
The theoretical evaluation of overall performance of the of the dataset utilized and its origin.
algorithm is shown in this section. The full total running
period for classic Apriori algorithm[31] can be defined in
equation (1)
TABLE I. DATASET PROPERTIES Fig. 3 shows that the enhanced Apriori at a minimum
Number of support of 0.20, marks as the highest time reduction rate of
Dataset Type Size
Transaction 62%. The average reduction rate of mall customer data is
D1 (e-commerce) Real 541910 44949KB 29%.
D2 (Mall Customer) Real 210 4Kb
D3 (Convenience) Real 787 48Kb
D4 (Suicide) Real 27821 2662Kb

B. Performance Analysis
The scanned time of the four datasets is presented in
Table II. Average scanned time of 58% for the four datasets
derived where D4 dataset has the highest scanned time
difference, and D3 resulted in the lowest time consumed in
scanning the database compared with the original Apriori
Algorithm.

TABLE II. DATASET SCANNING TIME


Enhanced
Apriori Time
Apriori Fig. 4. Comparison of Apriori and Enhanced Apriori on convenience data
Dataset Scanned Reducing
Scanned
Time (s) Rate
Time (s) With Fig. 4, at a minimum rate of 0.02, the reducing
D1 (e-commerce) 79 1.20 98% rate of the enhance Apriori is 40% compared it with the
D2 (Mall Customer) 0.06 0.04 28%
D3 (Convenience) 0.18 0.17 8%
original Apriori. The average reduction time of the
D4 (Suicide) 509 1 100% enhanced Apriori is 14%.

Fig. 2 to Fig. 5 shows the comparison of the original


Apriori and the enhanced Apriori algorithm by
implementing it with the different datasets on Table 1
through the different minimum support rate.

Fig. 5. Comparison of Apriori and Enhanced Apriori on suicide data

In Fig. 5, the enhance Apriori average reduction time is


99% compared with the original Apriori. The time reduction
Fig. 2. Comparison of Apriori and Enhanced Apriori on e-commerce data
rate of the minimum support of 0.20 is 100%.
Using the equations (1) and (2) of Section III, the
The average reduction time rate of enhance Apriori on
difference in the runtime between the two algorithms
E-commerce data is 89% with a highest time reducing rate
depends on the number of items, and the dimension of the
of 99% at minimum support of 0.35 compared it from the data. It is observed that the enhanced algorithm efficiently
original Apriori. Fig. 2 shows comparison of the enhanced works on large datasets, as shown in Fig. 2 to 5.
Apriori and the original Apriori with the different minimum
support.
V. CONCLUSION
From this paper, trimming the transaction data size
serves as the essential point to further improve the Apriori
algorithm efficiency. The applied method not only optimizes
the algorithm by minimizing the candidate sets generation,
but it also enhances the input-output cost by minimizing
database transaction or known as transaction reduction
scheme. The efficiency of the algorithm improves, and the
common problem of Apriori regarding overhead
maintenance was managed through the integration of
hashing approach.

Fig. 3. Comparison of Apriori and Enhanced Apriori on mall customer data


Future work could be geared towards overhead database Algorithm,” I.J. Inf. Technol. Comput. Sci., vol. 2, no. February, pp.
48–55, 2019.
maintenance, which commonly handled using multiple
processors integration. [19] T. A. Kumbhare and S. V. Chobe, “An Overview of Association
Rule Mining Algorithms,” Int. J. Comput. Sci. Inf. Technol., vol. 5,
no. 1, pp. 927–930, 2014.
REFERENCES
[20] D. K. Hanirex and K. P. Kaliyamurthie, “Mining Frequent Item Sets
[1] X. Wang, H. Yan, and J. Li, “An improved supervised learning for Association Rule Mining in Relational Databases : An
defect prediction model based on cat swarm algorithm,” in Journal Implementation of SETM Algorithm Using Super Market Dataset,”
of Physics: Conference Series, 2018, vol. 1087, no. 2. Indian J. Sci. Technol., vol. 8, no. November, pp. 6–10, 2015.
[2] M. Kaur and S. Kang, “Market Basket Analysis: Identify the [21] S. Sihag and G. Tanwar, “A Survey on Optimization of APRIORI
Changing Trends of Market Data Using Association Rule Mining,” Algorithm for high Performance,” IJITKM, vol. 7, no. 2, pp. 185–
in Procedia Computer Science, 2016, vol. 85, no. Cms, pp. 78–85. 187, 2014.
[3] S. Aggarwala and B. Rani, “Optimization of Association Rule [22] D. Sarkar, Abhijit. Paul, Apurba. Mahata, Sainik Kumar. Kumar,
Mining Process using Apriori and Ant Colony Optimization “Modified Apriori Algorithm to find out Association Rules using
Algorithm,” Eur. J. Mark., vol. 11, no. 1, pp. 1–32, 2016. Tree based Modified Apriori Algorithm to find out Association
[4] S. Berretti, S. M. Thampi, and P. R. Srivastava, “Intelligent systems Rules using Tree based Approach,” Int. J. Comput. Appl., no.
technologies and applications: Volume 1,” Adv. Intell. Syst. Comput., October, 2017.
vol. 384, pp. 225–226, 2016. [23] A. J. Doshi and B. Joshi, “Comparative analysis of Apriori and
[5] I. T. and S. A. Agrawal R, “‘Mining association rules between sets Apriori with hashing algorithm,” Int. Res. J. Eng. Technol., pp. 976–
of items in large database,’” Proceeding 1993 ACM SIGMOD Int. 979, 2018.
Conf. Manag. Data, ACM Press, p. 207–216., 1993. [24] N. Srivastava, K. Gupta, and N. Baliyan, “Improved Market Basket
[6] D. Ai, H. Pan, X. Li, Y. Gao, and D. He, “Association rule mining Analysis with Utility Mining,” 3rd Int. Conf. Internet Things
algorithms on high-dimensional datasets,” Artif. Life Robot., vol. 23, Connect. Technol. ICIoTCT 2018, pp. 716–720, 2018.
no. 3, pp. 420–427, 2018. [25] K. Vyas and S. Sherasiya, “Modified Apriori Algorithm Using Hash
[7] S. Rathee, M. Kaul, and A. Kashyap, “R-Apriori: An Efficient Based Technique,” IJARIIE, vol. 2, no. 3, pp. 1229–1234, 2016.
Apriori based Algorithm on Spark,” Proc. 8th Work. Ph. D. Work. [26] K. Bhuvaneswari, “SecureAssociation Rule Mining Using Array
Inf. Knowl. Manag. , ACM, pp. 27–34, 2015. Mapping Table,” Int. J. Pure Appl. Math., vol. 118, no. 8, pp. 141–
[8] R. Moodley, F. Chiclana, F. Caraffini, and J. Carter, “Application of 147, 2018.
uninorms to market basket analysis,” Int. J. Intell. Syst., vol. 34, no. [27] H. Hu and Y. Chen, “Research on the Factors of Frequent Itemset
1, pp. 39–49, 2019. Mining Based on Dynamic Hashing Number of Transaction Items,”
[9] Y. Huang, Q. Lin, and Y. Li, “Apriori-BM Algorithm for Mining Int. Conf. Network, Commun. Comput. Eng. (NCCE 2018), vol. 147,
Association Rules Based on Bit Set Matrix,” Proc. 2018 2nd IEEE no. Ncce, pp. 215–220, 2018.
Adv. Inf. Manag. Commun. Electron. Autom. Control Conf. IMCEC [28] S. Aggarwal, “Transaction Reduction Approach to Improve
2018, no. Imcec, pp. 2580–2584, 2018. Efficiency of Apriori Algorithm,” Int. J. Comput. Commun. Syst.
[10] Z. Chen, A. Choudhary, G. Trajcevski, Y. Xie, D. Palsetia, and A. Eng., vol. 2, no. 3, pp. 461–463, 2015.
Agrawal, “SILVERBACK+: scalable association mining via fast list [29] C. Bhavani and P. Madhavi, “Improving Efficiency of Apriori
intersection for columnar social data,” Knowl. Inf. Syst., vol. 50, no. Algorithm,” Int. J. Comput. Trends Technol., vol. 27, no. 2, pp. 93–
3, pp. 969–997, 2016. 99, 2015.
[11] K. Zhang, J. Liu, Y. Chai, J. Zhou, and Y. Li, “A method to optimize [30] I. E. Fellows, “UCLA UCLA Electronic Theses and Dissertations
apriori algorithm for frequent items mining,” Proc. - 2014 7th Int. University of California by,” 2015.
Symp. Comput. Intell. Des. Isc. 2014, vol. 1, pp. 71–75, 2015.
[31] S. K R and K. R, “Web Log Mining using Improved Version of
[12] K. Wang, Y. Qi, J. J. Fox, M. R. Stan, and K. Skadron, “Association Apriori Algorithm,” Int. J. Comput. Appl., vol. 29, no. 6, pp. 23–27,
Rule Mining with the Micron Automata Processor,” Proc. - 2015 2011.
IEEE 29th Int. Parallel Distrib. Process. Symp. IPDPS 2015, pp.
689–699, 2015. [32] “Kaggle.” [Online]. Available: https://www.kaggle.com/datasets.
[13] M. Mlambo, N. Gasela, M. Esiefarienrhe, and B. Isong, “On the
Optimization of Improved Apriori Algorithm via Linked-list Trie,”
Proc. 2017 Int. Conf. Big Data Res. - ICBDR 2017, pp. 62–66,
2017.
[14] Y. Djenouri, D. Djenouri, A. Belhadi, P. Fournier-Viger, J. Chun-
Wei Lin, and A. Bendjoudi, “Exploiting GPU parallelism in
improving bees swarm optimization for mining big transactional
databases,” Inf. Sci. (Ny)., vol. 0, pp. 1–17, 2018.
[15] M. Munge and H. Shubhangi, “Network Optimization using Ant
Colony Algorithm,” Int. Conf. Autom. Control Dyn. Optim. Tech.
ICACDOT 2016, pp. 952–954, 2017.
[16] S. Ammar and F. Ba-Alwi, “Improved FTWeightedHashT Apriori
Algorithm for Big Data using Hadoop-MapReduce Model,” J. Adv.
Math. Comput. Sci., vol. 27, no. 1, pp. 1–11, 2018.
[17] V. U. Parikh and P. Shah, “E-commerce Recommendation System
usingAssociation Rule Mining and Clustering,” Int. J. Innov. Adv.
Comput. Sci., vol. 4, no. June, 2015.
[18] B. S. Neysiani, N. Soltani, R. Mofidi, M. N. Shahraki, and N.
Branch, “Improve Performance of Association Rule - Based
Collaborative Filtering Recommendation Systems using Genetic

You might also like