Professional Documents
Culture Documents
Association Rule Mining
Association Rule Mining
2) Medical Diagnosis
Association rules in medical diagnosis can help physicians diagnose and treat patients.
Diagnosis is a difficult process with many potential errors that can lead to unreliable results.
You can use relational association rule mining to determine the likelihood of illness based on
various factors and symptoms. This application can be further expanded using some learning
techniques on the basis of symptoms and their relationships in accordance with diseases.
3) Census Data
The concept of Association Rule Mining is also used in dealing with the massive amount of
census data. If properly aligned, this information can be used in planning efficient public
services and businesses.
Advantages Association Rules
Association Rule Advantages Disadvantages
Mining Algorithm
Step 2: Create strong association rules using the frequently used itemsets
Association rules are created by constructing associations from the frequent itemsets
created in step 1. To find strong associations, this employs a metric known as
confidence.
The Apriori algorithm is one of the most fundamental Association Rule Mining
algorithms. It is based on the idea that "having prior knowledge of frequent itemsets
can generate strong association rules." The term Apriori refers to prior knowledge.
Apriori discovers frequent itemsets through a process known as candidate itemset
generation. This is an iterative approach that uses k-itemsets to explore (k+1)-
itemsets. The set of frequent 1-itemsets is found first, followed by the set of frequent
2-itemsets, and so on until no more frequent k-itemsets can be found.
An important property known as the Apriori property is used to reduce the search
space to improve the efficiency of the level-wise generation of frequent itemsets.
According to the Apriori Property, "all non-empty subsets of a frequent itemset must
also be frequent."
This means that if an item is frequent, its subsets will also be frequent. For example, if
[Bread, Butter] is a frequent item set, [Bread] and [Butter] must be frequent
individually as well.
Define: Support, Confidence, Lift
Support
Support refers to an item's default popularity and can be calculated by dividing the
Assume we want to find help for item B. This can be calculated as follows:
Confidence
If item A is purchased, confidence refers to the likelihood that item B will be purchased
Lift
Lift(A -> B) denotes the increase in the sale ratio of B when A is sold. Lift(A -> B) is
mathematically as:
Consider a Big Bazar scenario where the product set is P = {Rice, Pulse, Oil, Milk, Apple}.
The database comprises six transactions where 1 represents the presence of the product and 0
represents the absence of the product.
Step 1
Make a frequency table of all the products that appear in all the transactions. Now, short the
frequency table to add only those products with a threshold support level of over 50 percent.
We find the given frequency table.
The above table indicated the products frequently bought by the customers.
Step 2
Create pairs of products such as RP, RO, RM, PO, PM, OM. You will get the given
frequency table.
Step 3
Implementing the same threshold support of 50 percent and consider the products that are
more than 50 percent. In our case, it is more than 3
Step 4
Now, look for a set of three products that the customers buy together. We get the given
combination.
Step 5
Calculate the frequency of the two itemsets, and you will get the given frequency table.
If you implement the threshold assumption, you can figure out that the customers' set of three
products is RPO.
We have considered an easy example to discuss the apriori algorithm in data mining. In
reality, you find thousands of such combinations.
2) Eclat Algorithm
Eclat denotes equivalence class transformation. The set intersection was supported by its
depth-first search formula. It’s applicable for each successive and parallel execution with
spot-magnifying properties. This can be the associate formula for frequent pattern mining
supported by the item set lattice’s depth-first search cross.
It is a DFS cross of the prefix tree rather than a lattice.
For stopping, the branch and a specific technique are used.
Let us now understand the above stated working with an example:-
Consider the following transactions record:-
The above-given data is a boolean matrix where for each cell (i, j), the value denotes whether
the j’th item is included in the i’th transaction or not. 1 means true while 0 means false.
We now call the function for the first time and arrange each item with it’s tidset in a tabular
fashion:-
k = 1, minimum support = 2
We now recursively call the function till no more item-tidset pairs can be combined:-
k=2
k=3
k=4
Table: Table 1
Table 2
The diagram given below depicts the conditional FP tree associated with the conditional node
I3.
References
https://www.analyticssteps.com/blogs/association-rule-mining-importance-and-steps
https://hevodata.com/learn/association-rule-mining/#intro2
https://www.javatpoint.com/fp-growth-algorithm-in-data-mining
https://www.geeksforgeeks.org/ml-frequent-pattern-growth-algorithm/
https://www.javatpoint.com/apriori-algorithm#:~:text=Generally%2C%20the%20apriori
%20algorithm%20operates,performance%20of%20the%20particular%20store.
https://www.geeksforgeeks.org/ml-eclat-algorithm/