Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 37

Market Basket Analysis and Association Rules

What can be inferred?


I purchase diapers I purchase a new car I purchase OTC cough medicine I purchase a prescription medication I dont show up for class

Market Basket Analysis


MBA is a set of techniques, Association Rules being most common, that focus on point-of-sale (p-o-s) transaction data 3 types of market basket data (p-o-s data)

Customers Orders (basic purchase data) Items (merchandise/services purchased)

Market Basket Analysis


Retail each customer purchases different set of products, different quantities, different times MBA uses this information to:

Identify who customers are (not by name) Understand why they make certain purchases Gain insight about its merchandise (products):

Take action:

Fast and slow movers Products which are purchased together Products which might benefit from promotion

Combining all of this with a customer loyalty card it becomes even more valuable
4

Store layouts Which products to put on specials, promote, coupons

Association Rules
DM technique most closely allied with Market Basket Analysis AR can be automatically generated

AR represent patterns in the data without a specified target variable Good example of undirected data mining

Market Basket Analysis : Measures


Consider the association rule Y Z, where Y and Z are two products. Y represents the antecedent en Z is called the consequent.

Support of the rule: the percentage of all baskets that contain both product Y and Z
support = P(Y Z).

Confidence of the rule: the percentage of all the baskets containing Y that also
contain Z. Hence, confidence is a conditional probability, i.e. P(Z|Y) confidence = P(Y Z)/P(Y).

Interest of the rule: measures the statistical dependence of the rule, by relating the

observed frequency of occurrence (P(Y Z)) to the expected frequency of cooccurrence under the assumption of conditional independence of Y and Z (P(Y)*P(Z)) interest = P(Y Z)/(P(Y)*P(Z)). Association-rule discovery is the process of finding strong product associations with a minimum support and/or confidence and an interest of at least one.
7

Association Rules Apply Elsewhere


Besides retail supermarkets, etc Purchases made using credit/debit cards Optional Telco Service purchases Banking services Unusual combinations of insurance claims can be a warning of fraud Medical patient histories

A certainty measure for association rules of the form A => B, where A and B are sets of items, is confidence. Given a set of task

Typical Data Structure (Relational Database)

Lots of questions can be answered


Avg # of orders/customer Avg # unique items/order Avg # of items/order For a product


What % of customers have purchased Avg # orders/customer include it Avg quantity of it purchased/order

Transaction Data

Etc

Visualization is extremely helpfulnext slide


10

Sales Order Characteristics

11

Sales Order Characteristics

Did the order use gift wrap? Billing address same as Shipping address? Did purchaser accept/decline a cross-sell? What is the most common item found on a one-item order? What is the most common item found on a multiitem order? What is the most common item for repeat customer purchases? How has ordering of an item changed over time? How does the ordering of an item vary geographically?
12

Pivoting for Cluster Algorithms

13

Association Rules
Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of also purchasing one of three types of candy bars [Forbes, Sept 8, 1997] Customers who purchase maintenance agreements are very likely to purchase large appliances When a new hardware store opens, one of the most commonly sold items is toilet bowl cleaners

14

Association Rules

Association rule types:

Actionable Rules contain high-quality, actionable information Trivial Rules information already wellknown by those familiar with the business Inexplicable Rules no explanation and do not suggest action

Trivial and Inexplicable Rules occur most often


15

How Good is an Association Rule?


Customer 1 2 3 4 5 Items Purchased Coke, soda Milk, Coke, window cleaner Coke, detergent Coke, detergent, soda Window cleaner, soda Cok e Coke Window cleaner Milk 4 1 1 Window cleaner 1 2 1 Milk 1 1 1

POS Transactions

Co-occurrence of Products
Soda 2 1 0 Detergent 2 0 0

Soda
Detergent

2
2

1
0

0
0

3
1

1
2
16

How Good is an Association Rule?


Cok e 4 Window cleaner Milk Soda Detergent 1 1 2 2 Window cleaner 1 2 1 1 0 Milk 1 1 1 0 0 Soda 2 1 0 3 1 Detergent 2 0 0 1 2

Simple patterns: 1. Coke and soda are more likely purchased together than any other two items 2. Detergent is never purchased with milk or window cleaner 3. Milk is never purchased with soda or detergent
17

How Good is an Association Rule?


Customer 1 2 3 4 5

Items Purchased Coke, soda Milk, Coke, window cleaner Coke, detergent Coke, detergent, soda Window cleaner, soda

POS Transactions

What is the confidence for this rule:


If a customer purchases soda, then customer also purchases Coke 2 out of 3 soda purchases also include Coke, so 67% 2 out of 4 Coke purchases also include soda, so 50%

What about the confidence of this rule reversed?

Confidence = Ratio of the number of transactions with all the items to the number of transactions with just the if items
18

How Good is an Association Rule?

How much better than chance is a rule?

Lift (improvementa) tells us how much better a rule is at predicting the result than just assuming the result in the first place
Lift is the ratio of the records that support the entire rule to the number that would be expected, assuming there was no relationship between the products Calculating liftp 310When lift > 1 then the rule is better at predicting the result than guessing When lift < 1, the rule is doing worse than informed guessing and using the Negative Rule produces a better rule than guessing Co-occurrence can occur in 3, 4, or more dimensions
19

Creating Association Rules


1.

Choosing the right set of items Generating rules by deciphering the counts in the co-occurrence matrix Overcoming the practical limits imposed by thousands or tens of thousands of unique items

2.

3.

20

Overcoming Practical Limits for Association Rules


1. 2.

Generate co-occurrence matrix for single itemsif Coke then soda Generate co-occurrence matrix for two itemsif Coke and Milk then

soda
3.

4.

Generate co-occurrence matrix for three itemsif Coke and Milk and Window Cleaner then soda Etc
21

Final Thought on Association Rules: The Problem of Lots of Data

Fast Food Restaurantcould have 100 items on its menu

Supermarket10,000 or more unique items

How many combinations are there with 3 different menu items? 161,700 !

Use of product hierarchies (groupings) helps address this common issue Finally, know that the number of transactions in a given time-period could also be huge (hence expensive to analyze)
22

50 million 2-item combinations 100 billion 3-item combinations

Business and other cases

23

24

25

26

27

28

29

30

31

32

General Observations

Banking case seems to provide well defined and intelligible information of the form:
account_1

and account_2,,, etc or activity_1 and activity_2, etc, possibly indexed by time. As such, rules found provide guide to action to .offer. product or service (cross-sell).

33

In retailing case of items purchased together, .guidance. is not so clear cut due to extensive number of rules. Soccer event exemplifies sequencing of events towards reaching goal. Basketball-applied software has been developed years ago. Web mining shares the same principles, without passion usually associated with sports.

34

Challenges

A major difficulty is that a large number of the rules found may be trivial for anyone familiar with the business

The computational complexity involved in calculating the results of market basket analysis is at least the square of the number of transaction item-lines (records of every item purchased.) With data warehouses storing billions of transaction lines, this yields extremely high computational requirements

35

Solutions

Differential market basket analysis can find interesting results and can also eliminate the problem of a potentially high volume of trivial results Special techniques involving filtering or aggregation of the transaction database are commonly used to in analysis algorithms to increase performance and allow some level of interactivity, such as in business intelligence applications.
36

Thank You!

37

You might also like