Recommendations Using Association Rules

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Web Analytics

Websites & Apps Clicks

• Clicks can means tap, swipe, expand, enter of a key, blink or any other modern devices
gestures performed for apps, websites, mobile sites.
• Devices – Computing Devices, Laptop, Tablet, Smartphones, Wearables, Gaming
Devices, AR/VR Devices, Wearable Devices
• Web comprises of Web Properties.
• Web Properties are management by Web management software and tools which can
be open-source as well as proprietary. One of the popular software is Google Analytics
platform.
• Web properties types are apps, websites, mobile sites, wearable apps
Web log data

• “Web log data," commonly


known as "web server log
data" or "server logs,“

• It is records of activities and


events generated by web
servers when users interact
with a website or web
application.

• These logs are automatically


created and stored by web
servers and contain valuable
information about the
interactions between users
and the website.
Clickstream Data

• Clickstream data - the sequence of clicks or interactions made by a user while navigating a website or
an application(apps).
• It provides a record of the pages visited, the order in which they were accessed, and the actions taken
on each page (e.g., clicks, taps, zoom, form submissions, downloads).
• Clickstream data is typically collected from the user-side.
• This data is crucial for understanding user behavior, identifying patterns, and optimizing user
experiences on the website. Clickstream data is commonly used in web analytics, conversion rate
optimization, and user journey analysis.
• Clickstream data can be obtained from Google Analytics platform or similar proprietary tools.
Transactional Data in Social Media & Web
Transactional data refers to
• Information related to specific user actions or interactions that result in a measurable outcome
or conversion on a website.
• These transactions can include a wide range of activities, such as purchases, sign-ups,
downloads, form submissions, or any other action that is considered valuable to the website
owner.
• Some example transactions for business :
• E-commerce Transactions – purchases/views made by users
• Lead Generation - form submissions, sign-ups for newsletters, requests for quotes, or contact form
inquiries
• Clicks on Specific Elements – Clicks for movies, houses, food, grocery, hotels, books, questions etc.
• Event Registrations, Subscription Sign-ups, Download Events
Google Analytics and similar platforms offer features to track and analyze transactional data.
Web Analytics • Web Content Analytics - Text Analytics
process can be applied for Topics,
Sentiments, and building models for
classification

• Web Usage Analytics – Application in


Recommender System, Page/Account
Analytics
Web Content Web Usage Web Structure
Analytics Analytics Analytics • Web Structure Analytics – one of the
method is Network analytics which can be
applied.

Data Source: Data Source:


Data Source: Text,
Weblogs , Hyperlink structure,
Reviews, Images,
Transactions, Clicks, Relations between
Multimedia files
Metrics, Ratings Webpages
Recommendations using Association
Rules
Weblog Data / Transaction Data/ Clickstream Data

• Web log data , Clickstream data or Transactions data of clicks recorded over a period
can provide insights in form of discovering patterns.

Session Id Item
Session ID 1 Item 1 Whenever Item 4 is clicked Item 5 is also clicked
Session ID 1 Item 3
OR
Session ID 1 Item 4
Session ID 1 Item 5 Whenever Item 4 is bought Item 5 is also bought
Session ID 2 Item 9 OR
Session ID 2 Item 4
Whenever Item 4 is viewed Item 5 is also viewed
Session ID 2 Item 5

Transaction DataSet Association Rule


Discovering Patterns using Association Rule Mining

Knowledge – in form of useful, rare,


important, Association Rules

Patterns

Transformed Data

Pre-processed Data

Knowledge – Frequent Itemsets


Transactional Data
Database
of Web logs,
Clickstream
What Is ASSOCIATION RULE MINING?
• Association rule mining is well known for Market Basket Analysis in retail business. It can also be
applied for clickstream analysis, transactional data analysis
• Finding frequent patterns, associations, or causal structures among sets of items or objects in
transaction databases, relational databases, and other information repositories is called association
rule mining.
• To simply put, given a set of records, each of which contain some number of items from a given
collection
• produce dependency rules which will predict occurrence of an item based on occurrences of other items

Discovery

Transaction Dataset Association Rule


Discovering Patterns
Pattern
• 12121?
• ’12’ pattern is found often enough. So, with some support we can say ‘?’ is 2
• “If ‘1’ then ‘2’ follows”
• Pattern ➔ Model

For a pattern on historical data, we may need more support. More support can lead to confidence.
Confidence
• 121212?
• 12121231212123121212?
• 121212➔ 3

• Models are created using historical data by detecting patterns. It is a calculated guess about
likelihood of repetition of pattern.
Assumption – Past behaviour of users is the best predictor of future performance
Transactional Data
• Web log data can be used as a Transactional Data Session Id Item
Session ID 1 Item 1
Session ID 1 Item 3
• Similarly, Clickstream Data and Order/View/Purchase/Rating Data Session ID 1 Item 4
Session ID 1 Item 5
• Let us use a representative Transactional Data for items sold by platforms like Session ID 2 Item 9
Session ID 2 Item 4
Session ID 2 Item 5
Big Basket or Grofers
Txn_ID Item
5 Diaper
3 Beer
3 Coke
1 Milk
4 Bread Txn_ID Itemset
2 Beer
1 Bread, Coke, Milk
3 Diaper Data Selection
4 Diaper 2 Beer, Bread
1 Coke
5 Milk 3 Beer, Coke, Diaper, Milk
1 Bread
4 Milk 4 Beer, Bread, Diaper, Milk
2 Bread 5 Coke, Diaper, Milk
5 Coke
4 Beer
3 Milk Transactional Datasbase Transaction ID Dataset
What Is ASSOCIATION RULE MINING?
Given a set of records, let say for a fresh farm ecommerce app , each of which contain some number of
items from a given collection
• produce dependency rules which will identify occurrence of an item based on occurrences of other items

For example, set of records is in form of transaction data table from which rules are found

Txn_ID Itemset
1 Bread, Coke, Milk
Discovering Rules Rules Found:
2 Beer, Bread, Milk {Milk} => {Coke}
3 Beer, Coke, Diaper, Milk {Milk, Beer} => {Diaper}
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk

Transaction ID Dataset Association Rule


Measures & Concepts of Association Rule Mining
• Itemset Txn_ID Itemset
• A collection of one or more items 1 Bread, Coke, Milk
• Example: {Milk, Bread, Diaper}
2 Beer, Bread, Milk
• k-itemset
• An itemset that contains k items 3 Beer, Coke, Diaper, Milk
Example: 4 Beer, Bread, Diaper, Milk
1-itemset { Bread}, {Milk} …. Length of itemset is 1 5 Coke, Diaper, Milk
2-Itemset { Milk, Diaper} , {Bread, Milk} … Length of itemset is 2
3- itemset {Milk, Bread, Diaper} … Length of itemset is 3

• Rule
• E.g. X => Y or LHS itemset = > RHS itemset or antecedent itemset => consequent itemset
• Where
• X is an itemset and is Left Hand Side (LHS) of Rule also called as antecedent of the rule. Let X = { Milk, Beer}
• Y is an itemset and is Right Hand Side (RHS) of Rule also called as consequent of the rule. Let Y = {Diaper}
• Rule sign => is implication means co-occurrence & NOT causality but merely association
Support
• Support count
• Frequency of occurrence of an itemset in the Transaction Id dataset
• Example Support Count of ({Milk, Beer, Diaper}) = 2 Txn_ID Itemset
• Example Support Count of ({Milk, Bread}) = 3 1 Bread, Coke, Milk
2 Beer, Bread, Milk
• Support
3 Beer, Coke, Diaper, Milk
• Fraction of transactions that contain an itemset
4 Beer, Bread, Diaper, Milk
• For a rule X => Y
• Probability that a transaction contains (X U Y) i.e. both X and Y
5 Coke, Diaper, Milk

• Example Support ({Milk, Beer, Diaper}) = 2/5 = 0.4


• Example Support ({Milk, Bread}) = 3/5 = 0.6

• Alternatively in probability notation,


• Support = P(XU Y) = Support Count (X U Y)/ Total no. of transactions
• Support is an indication of how frequently the itemset appears in the dataset
Confidence
• Confidence Txn_ID Itemset
• For a rule X => Y 1 Bread, Coke, Milk
• conditional probability that a transaction having X also contains Y 2 Beer, Bread, Milk
• Measures how often itemset Y appear in transactions that 3 Beer, Coke, Diaper, Milk
contains X itemset
4 Beer, Bread, Diaper, Milk
• E.g. For Rule {Milk, Beer} => {Diaper}
5 Coke, Diaper, Milk
• Confidence ({Milk, Beer, Diaper}) = 2/3 = 0.67

• Alternatively,
• Confidence(X=>Y) = Support (X U Y)/ Support (X)
• If we take Ex & Ey as events that a transaction contains itemset X & Y respectively then
• Support (X U Y) = P (Ex  Ey)
• Confidence (X=>Y) = P(Ey/Ex) = P (Ex  Ey) / P(Ex) = Support (X U Y) / Support (X)

• Confidence is an indication of how often the rule has been found to be true.
Association Rule Mining Task
• Now the Association Rule Mining Task can be broken down as
• Given a set of transactions T, the goal of association rule mining is to find all rules having
• support ≥ minsup threshold ( user provided parameter) – generate frequent itemsets
• confidence ≥ minconf threshold ( user provided parameter) – generate association rules itemsets support
(Beer) 0.6
• Also, additionally we can consider Lift measure for evaluating rules with lift ≥ minlift threshold (Bread) 0.6
(Coke, Milk) 0.6
Txn_ID Item
Txn_ID Itemset Txn_Id Beer Bread Coke Diaper Milk
(Milk, Diaper) 0.6
5 Diaper (Diaper, Bread, Beer) 0.2
3 Beer 1 Bread, Coke, Milk 1 0 1 1 0 1 (Milk, Bread, Beer) 0.4
3 Coke 2 1 1 0 0 1
(Diaper, Coke, Beer) 0.2
1 Milk 2 Beer, Bread (Coke, Milk, Beer) 0.2
3 1 0 1 1 1 (Diaper, Milk, Beer) 0.4
4 Bread
3 Beer, Coke, Diaper, Milk (Milk, Coke, Bread) 0.2
2 Beer 4 1 1 0 1 1
3 Diaper Frequent Itemsets
5 0 0 1 1 1
4 Diaper 4 Beer, Bread, Diaper,
Milk
1 Coke Processed Data
5 Milk 5 Coke, Diaper, Milk
1 Bread antecedents consequents support confidence lift
4 Milk (Beer) (Diaper) 0.4 0.666667 1.111111
Transaction ID Dataset
2 Bread (Coke) (Diaper) 0.4 0.666667 1.111111
5 Coke (Milk) (Diaper) 0.6 0.6 1
4 Beer (Milk, Beer) (Diaper) 0.4 0.666667 1.111111

3 Milk (Coke, Milk) (Diaper) 0.4 0.666667 1.111111

Transactional Database Association Rules


Rule Evaluation using Lift
• Lift Measure
• For a rule X => Y

• Lift (X=>Y) = Support (X U Y)/ (Support (X)*Support(Y))

• Lift & other measures can be used to prune/rank the derived patterns
Coffee 𝐶𝑜𝑓𝑓𝑒𝑒
• Consider the transactions of grocery online retail with items of Tea

and Coffee. Consider total 100 transactions as summarized in table Tea 15 5 20


𝑇𝑒𝑎 75 5 80
• Let us test Association Rule: {Tea} => {Coffee}
90 10 100

Confidence = P(Coffee|Tea) = 15/20 = 0.75 So it seems good rule.

• But note that Support (Coffee) = 90/100 = 0.90

• Lift = P(Coffee|Tea) / Support (Coffee) = 0.75/0.9= 0.8333

• Now Lift < 1 denotes negative association or items are substitute. Lift > 1 denotes positive association.

• As in example Lift < 1, therefore is negatively associated i.e. substitute items) . So Lift is useful to judge positive as well as negative association.

• Other rules interest measures are leverage, conviction, rule power factor , chisquare, cosine, coverage which are described at
http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/
Applications of Association Rule Mining
• Market Basket Analysis or Association Rule Mining is applied in areas such as

• Clickstream Analytics – Consider rule for Goodreads as if you viewed the {Biography, … } -->
{Memoir}
• Marketing and Sales Promotion – Consider discovered rule as {Laptop, … } --> {Mousepad}
Applications of Association Rule Mining
• Sequential Pattern Discovery - Given: set of objects, each associated with its own timeline of
events, find rules that predict strong sequential dependencies among different events, of the
form (A B) (C) (D E) --> (F).
For example, (Shoes) (Racket, Racketball) --> (Sports Jacket)
• Catalogue design for business, Product clustering , Credit/debit card analysis , Web usage
mining, Banking & Insurance products profiles
• Bundling of Frequent items together for cross sell and up sell.
Applications of Association Rule Mining
Supermarket shelf management - Consider discovered rule, for simiplicity, {bread} => {butter}
E-commerce sites and apps management – Arrange the items placement on apps for strategies

Translate Rules to Strategy


Strategy 1: Placing milk and bread as
frequently bought together on fresh food
website may further encourage the sale
of these items

Strategy 2: Placing milk and bread at opposite


ends. When people have put milk you remind at
the end of checkout process to add bread in cart,
along with other items associated

Strategy 3:Put these two items into a package at


reduced price.

10%
Off
Apriori Algorithm in Action – Numerical Example

• Let Bread be assigned code 1 Butter – 2; Milk -3 ; Sugar-4


• Then Transaction ID dataset would be

Txn_ID Itemset Txn_ID Itemset


A {Bread, Butter, Milk, Sugar} A {1,2,3,4}
B {Bread, Butter, Sugar} B {1,2,4}
C {Bread, Butter} C {1,2}
D {Butter, Milk, Sugar} D {2,3,4}
E {Butter, Milk} E {2,3}
F {Milk, Sugar} F {3,4}
G {Butter, Sugar} G {2,4}
Apriori Algorithm in Action
Determine Frequent 1-Itemset Determine Frequent 2-Itemset
Let minsupport count = 3, so any itemset
Only {1,3} & {1,4} are not frequent.
which appears more than equal to 3 will be
Apriori Algorithm make use of the result that
frequent itemset
any superset of these will not be frequent
Itemsets Support count
Itemset Support count
{1} 3
{2} 6 {1,2} 3
{3} 4 {1,3} 1
{4} 5 {1,4} 2
{2,3} 3
{2,4} 4
{3,4} 3
Determine Frequent 3-Itemset

Only {2,3,4} needs to be computed for


support Itemset Support count
{2,3,4} 2
Additional Reference Details – for those who want to explore further ( not
mandatory)
Approaches for association rule generation
• One approach can be Brute Force
• List all possible association rules
• Compute the support and confidence for each rule
• Prune rules that fail the minsup and minconf thresholds
• But this is Computationally Prohibitive

• Other is to use the algorithms like apriori, FP Growth, Eclat. These are implemented in mlxtend package
in python and Arules package in R.
Algorithms

• Apriori algorithm - uses a breadth-first search strategy to count the support of itemsets
and uses a candidate generation function which exploits the downward closure
property of support.

• ECLAT algorithm - stands for Equivalence Class Transformation is a depth-first search


algorithm based on set intersection.

• FP-growth algorithm - FP stands for frequent pattern


Frequent Itemset Generation from Lattice

null

A B C D E

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE

You might also like