Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Apriori Algorithm in Machine Learning

The Apriori algorithm uses frequent itemsets to generate association rules, and it is designed to work
on the databases that contain transactions. With the help of these association rule, it determines
how strongly or how weakly two objects are connected. This algorithm uses a breadth-first search and
Hash Tree to calculate the itemset associations efficiently. It is the iterative process for finding
the frequent itemsets from the large dataset.
This algorithm was given by the R. Agrawal and Srikant in the year 1994. It is mainly used for
market basket analysis and helps to find those products that can be bought together.
It can also be used in the healthcare field to find drug reactions for patients.

What is Frequent Itemset?

Frequent itemsets are those items whose support is greater than the threshold value or user-specified
minimum support. It means if A & B are the frequent itemsets together, then individually A and B should
also be the frequent itemset. Suppose there are the two transactions: A= {1,2,3,4,5}, and B= {2,3,7},
in these two transactions, 2 and 3 are the frequent itemsets.

Note: To better understand the apriori algorithm, and related term such as support and confidence,
it is recommended to understand the association rule learning.

Steps for Apriori Algorithm


Below are the steps for the apriori algorithm:

Step-1: Determine the support of itemsets in the transactional database, and select the minimum
support and confidence.

Step-2: Take all supports in the transaction with higher support value than the minimum or selected

support value.

Step-3: Find all the rules of these subsets that have higher confidence value than the threshold
or minimum confidence.

Step-4: Sort the rules as the decreasing order of lift.


Apriori Algorithm Working
We will understand the apriori algorithm using an example and mathematical calculation:
Example: Suppose we have the following dataset that has various transactions, and from this dataset,
we need to find the frequent itemsets and generate the association rules using the Apriori algorithm:

Solution:

Step-1: Calculating C1 and L1:

o In the first step, we will create a table that contains support count (The frequency of each
itemset individually in the dataset) of each itemset in the given dataset.
This table is called the Candidate set or C1.
o Now, we will take out all the itemsets that have the greater support count that the
Minimum Support (2). It will give us the table for the frequent itemset L1.

Since all the itemsets have greater or equal support count than the minimum support, except the E,
so E itemset will be removed.

Step-2: Candidate Generation C2, and L2:

o In this step, we will generate C2 with the help of L1. In C2, we will create the pair of the itemsets
of L1 in the form of subsets.

o After creating the subsets, we will again find the support count from
the main transaction table of datasets, i.e., how many times these pairs have occurred
together in the given dataset. So, we will get the below table for C2:
o Again, we need to compare the C2 Support count with the minimum support count,
and after comparing, the itemset with less support count will be eliminated from the table C2.
It will give us the below table for L2

Step-3: Candidate generation C3, and L3:

o For C3, we will repeat the same two processes, but now we will form the C3

table with subsets of three itemsets together, and will calculate the support count from the dataset.

It will give the below table:

o Now we will create the L3 table. As we can see from the above C3 table, there is only one
combination of itemset that has support count equal to the minimum support count.
So, the L3 will have only one combination, i.e., {A, B, C}.
Step-4: Finding the association rules for the subsets:
To generate the association rules, first, we will create a new table with the possible rules from
the occurred combination {A, B.C}. For all the rules, we will calculate the Confidence using formula
sup( A ^B)/A. After calculating the confidence value for all rules, we will exclude the rules that have
less confidence than the minimum threshold(50%).

Consider the below table:

Rules Support Confidence

A ^B → C 2 Sup{(A ^B) ^C}/sup(A ^B)= 2/4=0.5=50%

B^C → A 2 Sup{(B^C) ^A}/sup(B ^C)= 2/4=0.5=50%

A^C → B 2 Sup{(A ^C) ^B}/sup(A ^C)= 2/4=0.5=50%

C→ A ^B 2 Sup{(C^( A ^B)}/sup(C)= 2/5=0.4=40%

A→ B^C 2 Sup{(A^( B ^C)}/sup(A)= 2/6=0.33=33.33%

B→ B^C 2 Sup{(B^( B ^C)}/sup(B)= 2/7=0.28=28%

As the given threshold or minimum confidence is 50%, so the first three rules

A ^B → C, B^C → A, and A^C → B can be considered as the strong association rules for the given problem.

Advantages of Apriori Algorithm

o This is easy to understand algorithm


o The join and prune steps of the algorithm can be easily implemented on large datasets.

Disadvantages of Apriori Algorithm

o The apriori algorithm works slow compared to other algorithms.


o The overall performance can be reduced as it scans the database for multiple times.
o The time complexity and space complexity of the apriori algorithm is O(2D), which is very high.
Here D represents the horizontal width present in the database.
Python Implementation of Apriori Algorithm
Now we will see the practical implementation of the Apriori Algorithm.
To implement this, we have a problem of a retailer, who wants to find the association between
his shop's product, so that he can provide an offer of "Buy this and Get that" to his customers.
The retailer has a dataset information that contains a list of transactions made by his customer.
In the dataset, each row shows the products purchased by customers or transactions made by the
customer. To solve this problem, we will perform the below steps:
o Data Pre-processing
o Training the Apriori model on the dataset
o Visualizing the results

1. Data Pre-processing Step:

The first step is data pre-processing step. Under this, first, we will perform the importing of the libraries.

The code for this is given below:

o Importing the libraries:

Before importing the libraries, we will use the below line of code to install the apyori package to use

further, as Spyder IDE does not contain it:

pip install apyroi

Below is the code to implement the libraries that will be used for different tasks of the model:

import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

o Importing the dataset:


Now, we will import the dataset for our apriori model. To import the dataset, there will be some
changes here. All the rows of the dataset are showing different transactions made by the
customers. The first row is the transaction done by the first customer, which means there is no
particular name for each column and have their own individual value or product details
(See the dataset given below after the code). So, we need to mention here in our code that
there is no header specified. The code is given below:
#Importing the dataset
dataset = pd.read_csv('Market_
transactions=[]
for i in range(0, 7501):

In the above code, the first line is showing importing the dataset into pandas format.
The second line of the code is used because the apriori() that we will use for training our model takes
the dataset in the format of the list of the transactions. So, we have created an empty list of the
transaction. This list will contain all the itemsets from 0 to 7500. Here we have taken 7501 because,
in Python, the last index is not considered.

The dataset looks like the below image:

2. Training the Apriori Model on the dataset

To train the model, we will use the apriori function that will be imported from the apyroi package.

This function will return the rules to train the model on the dataset. Consider the below code:
from apyori import apriori
rules= apriori(transactions= tran

In the above code, the first line is to import the apriori function.

In the second line, the apriori function returns the output as the rules. It takes the following parameters:

o transactions: A list of transactions.


o min_support= To set the minimum support float value.
Here we have used 0.003 that is calculated by taking 3 transactions per customer each
week to the total number of transactions.
o min_confidence: To set the minimum confidence value.
Here we have taken 0.2. It can be changed as per the business problem.
o min_lift= To set the minimum lift value.
o min_length= It takes the minimum number of products for the association.
o max_length = It takes the maximum number of products for the association.

3. Visualizing the result

Now we will visualize the output for our apriori model.

Here we will follow some more steps, which are given below:

o Displaying the result of the rules occurred from the apriori function

results= list(rules)
results

By executing the above lines of code, we will get the 9 rules. Consider the below output:

Output:

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004533333333333334,


ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}),
confidence=0.2905982905982906, lift=4.843304843304844)]),
RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005733333333333333,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}),
confidence=0.30069930069930073, lift=3.7903273197390845)]),
RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005866666666666667,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}),
confidence=0.37288135593220345, lift=4.700185158809287)]),
RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.0033333333333333335,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}),
confidence=0.2450980392156863, lift=5.178127589063795)]),
RelationRecord(items=frozenset({'ground beef', 'herb & pepper'}), support=0.016,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'herb & pepper'}), items_add=frozenset({'ground beef'}),
confidence=0.3234501347708895, lift=3.2915549671393096)]),
RelationRecord(items=frozenset({'tomato sauce', 'ground beef'}), support=0.005333333333333333,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'tomato sauce'}), items_add=frozenset({'ground beef'}),
confidence=0.37735849056603776, lift=3.840147461662528)]),
RelationRecord(items=frozenset({'olive oil', 'light cream'}), support=0.0032,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'olive oil'}),
confidence=0.20512820512820515, lift=3.120611639881417)]),
RelationRecord(items=frozenset({'olive oil', 'whole wheat pasta'}), support=0.008,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'whole wheat pasta'}), items_add=frozenset({'olive oil'}),
confidence=0.2714932126696833, lift=4.130221288078346)]),
RelationRecord(items=frozenset({'pasta', 'shrimp'}), support=0.005066666666666666,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'shrimp'}),
confidence=0.3220338983050848, lift=4.514493901473151)])]

As we can see, the above output is in the form that is not easily understandable. So, we will print all the

rules in a suitable format.

o Visualizing the rule, support, confidence, lift in more clear way:

for item in results:


pair = item[0]
items = [x for x in pair]
print("Rule: " + items[0] + " ->

Output:

By executing the above lines of code, we will get the below output:

Rule: chicken -> light cream


Support: 0.004533333333333334
Confidence: 0.2905982905982906
Lift: 4.843304843304844
=====================================
Rule: escalope -> mushroom cream sauce
Support: 0.005733333333333333
Confidence: 0.30069930069930073
Lift: 3.7903273197390845
=====================================
Rule: escalope -> pasta
Support: 0.005866666666666667
Confidence: 0.37288135593220345
Lift: 4.700185158809287
=====================================
Rule: fromage blanc -> honey
Support: 0.0033333333333333335
Confidence: 0.2450980392156863
Lift: 5.178127589063795
=====================================
Rule: ground beef -> herb & pepper
Support: 0.016
Confidence: 0.3234501347708895
Lift: 3.2915549671393096
=====================================
Rule: tomato sauce -> ground beef
Support: 0.005333333333333333
Confidence: 0.37735849056603776
Lift: 3.840147461662528
=====================================
Rule: olive oil -> light cream
Support: 0.0032
Confidence: 0.20512820512820515
Lift: 3.120611639881417
=====================================
Rule: olive oil -> whole wheat pasta
Support: 0.008
Confidence: 0.2714932126696833
Lift: 4.130221288078346
=====================================
Rule: pasta -> shrimp
Support: 0.005066666666666666
Confidence: 0.3220338983050848
Lift: 4.514493901473151
=====================================
From the above output, we can analyze each rule. The first rules, which is Light cream → chicken,

states that the light cream and chicken are bought frequently by most of the customers.

The support for this rule is 0.0045, and the confidence is 29%. Hence, if a customer buys light cream,

it is 29% chances that he also buys chicken, and it is .0045 times appeared in the transactions.

We can check all these things in other rules also.


Hierarchical Clustering in Machine Learning
Hierarchical clustering is another unsupervised machine learning algorithm, which is used to group the unlabeled
datasets into a cluster and also known as hierarchical cluster analysis or HCA.

In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-shaped structure is
known as the dendrogram.

Sometimes the results of K-means clustering and hierarchical clustering may look similar, but they both differ
depending on how they work. As there is no requirement to predetermine the number of clusters as we did in the
K-Means algorithm.

The hierarchical clustering technique has two approaches:

1. Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts with taking all data
points as single clusters and merging them until one cluster is left.
2. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-down approach.

Why hierarchical clustering?


As we already have other clustering algorithms such as K-Means Clustering, then why we need hierarchical
clustering? So, as we have seen in the K-means clustering that there are some challenges with this algorithm,
which are a predetermined number of clusters, and it always tries to create the clusters of the same size. To solve
these two challenges, we can opt for the hierarchical clustering algorithm because, in this algorithm, we don't need
to have knowledge about the predefined number of clusters.

In this topic, we will discuss the Agglomerative Hierarchical clustering algorithm.

Agglomerative Hierarchical clustering


The agglomerative hierarchical clustering algorithm is a popular example of HCA. To group the datasets into
clusters, it follows the bottom-up approach. It means, this algorithm considers each dataset as a single cluster at
the beginning, and then start combining the closest pair of clusters together. It does this until all the clusters are
merged into a single cluster that contains all the datasets.

This hierarchy of clusters is represented in the form of the dendrogram.

How the Agglomerative Hierarchical clustering Work?


The working of the AHC algorithm can be explained using the below steps:
o Step-1: Create each data point as a single cluster. Let's say there are N data points, so the number of
clusters will also be N.

o Step-2: Take two closest data points or clusters and merge them to form one cluster. So, there will now be
N-1 clusters.
o Step-3: Again, take the two closest clusters and merge them together to form one cluster. There will be N-2
clusters.

o Step-4: Repeat Step 3 until only one cluster left. So, we will get the following clusters. Consider the below
images:
o Step-5: Once all the clusters are combined into one big cluster, develop the dendrogram to divide the
clusters as per the problem.

Note: To better understand hierarchical clustering, it is advised to have a look on k-means clustering

Measure for the distance between two clusters


As we have seen, the closest distance between the two clusters is crucial for the hierarchical clustering. There are
various ways to calculate the distance between two clusters, and these ways decide the rule for clustering. These
measures are called Linkage methods. Some of the popular linkage methods are given below:
1. Single Linkage: It is the Shortest Distance between the closest points of the clusters. Consider the below
image:

2. Complete Linkage: It is the farthest distance between the two points of two different clusters. It is one of
the popular linkage methods as it forms tighter clusters than single-linkage.

3. Average Linkage: It is the linkage method in which the distance between each pair of datasets is added up
and then divided by the total number of datasets to calculate the average distance between two clusters. It
is also one of the most popular linkage methods.
4. Centroid Linkage: It is the linkage method in which the distance between the centroid of the clusters is
calculated. Consider the below image:

From the above-given approaches, we can apply any of them according to the type of problem or business
requirement.

Woking of Dendrogram in Hierarchical clustering


The dendrogram is a tree-like structure that is mainly used to store each step as a memory that the HC algorithm
performs. In the dendrogram plot, the Y-axis shows the Euclidean distances between the data points, and the x-
axis shows all the data points of the given dataset.

The working of the dendrogram can be explained using the below diagram:
In the above diagram, the left part is showing how clusters are created in agglomerative clustering, and the right
part is showing the corresponding dendrogram.

o As we have discussed above, firstly, the datapoints P2 and P3 combine together and form a cluster,
correspondingly a dendrogram is created, which connects P2 and P3 with a rectangular shape. The hight is
decided according to the Euclidean distance between the data points.
o In the next step, P5 and P6 form a cluster, and the corresponding dendrogram is created. It is higher than of
previous, as the Euclidean distance between P5 and P6 is a little bit greater than the P2 and P3.
o Again, two new dendrograms are created that combine P1, P2, and P3 in one dendrogram, and P4, P5, and
P6, in another dendrogram.
o At last, the final dendrogram is created that combines all the data points together.

We can cut the dendrogram tree structure at any level as per our requirement.

Python Implementation of Agglomerative Hierarchical Clustering


Now we will see the practical implementation of the agglomerative hierarchical clustering algorithm using Python.
To implement this, we will use the same dataset problem that we have used in the previous topic of K-means
clustering so that we can compare both concepts easily.

The dataset is containing the information of customers that have visited a mall for shopping. So, the mall owner
wants to find some patterns or some particular behavior of his customers using the dataset information.

Steps for implementation of AHC using Python:


The steps for implementation will be the same as the k-means clustering, except for some changes such as the
method to find the number of clusters. Below are the steps:

1. Data Pre-processing
2. Finding the optimal number of clusters using the Dendrogram
3. Training the hierarchical clustering model
4. Visualizing the clusters

Data Pre-processing Steps:


In this step, we will import the libraries and datasets for our model.

o Importing the libraries


# Importing the libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

The above lines of code are used to import the libraries to perform specific tasks, such as numpy for the
Mathematical operations, matplotlib for drawing the graphs or scatter plot, and pandas for importing the dataset.

o Importing the dataset

# Importing the dataset


dataset = pd.read_csv('Mall_Cu

As discussed above, we have imported the same dataset of Mall_Customers_data.csv, as we did in k-means
clustering. Consider the below output:

o Extracting the matrix of features


Here we will extract only the matrix of features as we don't have any further information about the dependent
variable. Code is given below:

x = dataset.iloc[:, [3, 4]].values

Here we have extracted only 3 and 4 columns as we will use a 2D plot to see the clusters. So, we are considering
the Annual income and spending score as the matrix of features.

Step-2: Finding the optimal number of clusters using the Dendrogram


Now we will find the optimal number of clusters using the Dendrogram for our model. For this, we are going to use
scipy library as it provides a function that will directly return the dendrogram for our code. Consider the below
lines of code:

#Finding the optimal number of c


import scipy.cluster.hierarchy a
dendro = shc.dendrogram(shc.l
mtp.title("Dendrogrma Plot")

In the above lines of code, we have imported the hierarchy module of scipy library. This module provides us a
method shc.denrogram(), which takes the linkage() as a parameter. The linkage function is used to define the
distance between two clusters, so here we have passed the x(matrix of features), and method "ward," the popular
method of linkage in hierarchical clustering.

The remaining lines of code are to describe the labels for the dendrogram plot.

Output:

By executing the above lines of code, we will get the below output:
Using this Dendrogram, we will now determine the optimal number of clusters for our model. For this, we will find
the maximum vertical distance that does not cut any horizontal bar. Consider the below diagram:

In the above diagram, we have shown the vertical distances that are not cutting their horizontal bars. As we can
visualize, the 4th distance is looking the maximum, so according to this, the number of clusters will be 5(the
vertical lines in this range). We can also take the 2nd number as it approximately equals the 4th distance, but we will
consider the 5 clusters because the same we calculated in the K-means algorithm.
So, the optimal number of clusters will be 5, and we will train the model in the next step, using the same.

Step-3: Training the hierarchical clustering model


As we know the required optimal number of clusters, we can now train our model. The code is given below:

#training the hierarchical model


from sklearn.cluster import Agg
hc= AgglomerativeClustering(n_
y_pred= hc.fit_predict(x)

In the above code, we have imported the AgglomerativeClustering class of cluster module of scikit learn library.

Then we have created the object of this class named as hc. The AgglomerativeClustering class takes the following
parameters:

o n_clusters=5: It defines the number of clusters, and we have taken here 5 because it is the optimal number
of clusters.
o affinity='euclidean': It is a metric used to compute the linkage.
o linkage='ward': It defines the linkage criteria, here we have used the "ward" linkage. This method is the
popular linkage method that we have already used for creating the Dendrogram. It reduces the variance in
each cluster.

In the last line, we have created the dependent variable y_pred to fit or train the model. It does train not only the
model but also returns the clusters to which each data point belongs.

After executing the above lines of code, if we go through the variable explorer option in our Sypder IDE, we can
check the y_pred variable. We can compare the original dataset with the y_pred variable. Consider the below
image:
As we can see in the above image, the y_pred shows the clusters value, which means the customer id 1 belongs to
the 5th cluster (as indexing starts from 0, so 4 means 5th cluster), the customer id 2 belongs to 4th cluster, and so on.

Step-4: Visualizing the clusters


As we have trained our model successfully, now we can visualize the clusters corresponding to the dataset.

Here we will use the same lines of code as we did in k-means clustering, except one change. Here we will not plot
the centroid that we did in k-means, because here we have used dendrogram to determine the optimal number of
clusters. The code is given below:

#visulaizing the clusters


mtp.scatter(x[y_pred == 0, 0], x
mtp.scatter(x[y_pred == 1, 0], x
mtp.scatter(x[y_pred== 2, 0], x[

Output: By executing the above lines of code, we will get the below output:

You might also like