Professional Documents
Culture Documents
Efficient Apriori Algorithm Using Enhanced Transaction Reduction Approach
Efficient Apriori Algorithm Using Enhanced Transaction Reduction Approach
Abstract—Apriori algorithm has fundamentally identified known algorithms under ARM are Apriori, Eclat, and FP-
algorithm in association rule mining. The principle key concept Growth applied to extract frequent itemsets [10].
of this algorithm is to discover interesting recurrent patterns
between different groups of data. It is a straight forward, and Apriori Algorithm (AA) recognized as a level-wise
an original algorithm, which employs a high iterative approach algorithm designed to perform and search design to
viewed as level-wise search. Nevertheless, this kind of commend frequent itemsets [11]. This algorithm also earned
algorithm has loads of downsides. Source on this algorithm, popularity and a commonly used algorithm for ARM [12].
this research denotes the constraint of the classical Apriori However, in the current trends, the datasets collated from
algorithm in terms of computational cost in mapping the entire transactions have become gigantic, comparing its
database for discovering frequent itemsets and signifies an accumulated data sets 10’years ago [13]. The immense data
enhancement of Apriori by minimizing the algorithm creates the Apriori Algorithm a dilemma involving large
generation cost through enhancing transaction reduction datasets. Initially, it expects successive mapping of the
approach. The enhanced Apriori when compared with the transactional database, agitating a computational generation
Apriori algorithm, the length of database scanning time cost [14]. Also, it propagates innumerable candidate sets
reduces to 58 percent, and the generation time reduces consumed by a considerable amount of memory resource
approximately 89 percent of the original Apriori. [15].
Keywords—Association rule mining, apriori algorithm, frequent Thus, a need to enhance the Apriori algorithm keeps on
item set mining, transaction reduction, hashing. attracting researchers for years using different techniques.
Numerous studies conducted and most modern approach
I. INTRODUCTION have been recommended to cope with this dilemma, each
with its advantages and drawbacks, but no ultimate strategy
Nearly all well-established businesses aggregated a attained for years [6]. For more significant application in this
multitude of decades of data collated from customers’ element, new algorithms that can enhance further the
information[1]. In the advent of e-commerce applications scalability and interpretability are nonetheless in high
proliferation, companies significantly accumulate data in demand [16]. This led to the main contribution of the study
months, not in years[2]. In the last ten years, Data Mining that gears on improving Apriori Algorithm to improve the
domain is otherwise known as knowledge Discovery, efficiency of the algorithm by minimizing the number of
designs and framework crafted from the performance database scans and limiting the input and output cost.
information of social insects datasets like ants constitute The structure of this paper is arranged in a manner
Knowledge Discovery in Databases (KDD), which focuses defined as follows. Section II discusses the Association Rule
on finding correlations, trends, correlations, patterns, Mining and related works. Section III gives an overview of
anomalies that gained promising popularity[3]. Comparison the design and process of Enhanced Apriori Algorithm.
to robust essential multi-databases help company’s makes Section IV presents the experimental results of the
accurate future decisions. Most widely used data mining comparative analysis of enhanced Apriori and the classic
applications in the understanding of item set associativity is Apriori. Section V concludes this paper. Finally,
association rule mining [4]. acknowledgment expresses gratitude to all who contributed
Association rule mining (ARM) gained a remarkably to the success of this work.
extensive and adequately researched techniques in data
mining and fundamentally popularized by Agrawal et al. in II. RELATED WORKS
1993 [5]. The purpose of the technique intent to fragment
frequent itemsets, compelling rules, associations, or rare data A. Association Rule Mining
organization among sets of items in a data repositories or
In knowledge discovery domain, Association Rule
other transaction databases. Market basket analysis serves as
Mining determines the relationship or the association rule
an example of ARM [6]. ARM processes determine
association rules that return the weighted minimum support between the data. The association rule expression is
and confidence from a defined transaction data [7]. Aside characterized as M form, where M is the antecedent
from market basket analysis, ARM is immensely noticeable and N is the consequent. The expression shows that the
in a diverse field like web search, process mining, medical number of times N occurred if M transpired base on the
assimilation, marketing advancement, and productively on support and confidence set in every process. Countless
the market dispensing [8]. Over time, several algorithms for algorithms in producing association rules were designed
generating association rules are introduced [9]. Popularly over time [17]. Few leading algorithms are Apriori known
as level-wise search, and FP-Growth. The issue of ARM
A. Properties of Datasets
The proposed approach is analyzed on four datasets to be
able to gauge the levels of performance on the datasets
having distinct attributes. Four original datasets (D1 to D4)
are found in the experimentations. Table 1 features the
details defining the properties of the datasets, the quantity of
Fig. 1. Enhanced Apriori Algorithm Architecture the transaction, and how large is a dataset. The four genuine
and distinct dataset (D1 to D4) is extracted from the Kaggle
B. Performance Measures repository [32]. Table 1 also displays the various properties
The theoretical evaluation of overall performance of the of the dataset utilized and its origin.
algorithm is shown in this section. The full total running
period for classic Apriori algorithm[31] can be defined in
equation (1)
TABLE I. DATASET PROPERTIES Fig. 3 shows that the enhanced Apriori at a minimum
Number of support of 0.20, marks as the highest time reduction rate of
Dataset Type Size
Transaction 62%. The average reduction rate of mall customer data is
D1 (e-commerce) Real 541910 44949KB 29%.
D2 (Mall Customer) Real 210 4Kb
D3 (Convenience) Real 787 48Kb
D4 (Suicide) Real 27821 2662Kb
B. Performance Analysis
The scanned time of the four datasets is presented in
Table II. Average scanned time of 58% for the four datasets
derived where D4 dataset has the highest scanned time
difference, and D3 resulted in the lowest time consumed in
scanning the database compared with the original Apriori
Algorithm.