Professional Documents
Culture Documents
Market Basket Analysis
Market Basket Analysis
I purchase diapers I purchase a new car I purchase OTC cough medicine I purchase a prescription medication I dont show up for class
Retail each customer purchases different set of products, different quantities, different times MBA uses this information to:
Identify who customers are (not by name) Understand why they make certain purchases Gain insight about its merchandise (products):
Take action:
Fast and slow movers Products which are purchased together Products which might benefit from promotion
Combining all of this with a customer loyalty card it becomes even more valuable
4
Association Rules
DM technique most closely allied with Market Basket Analysis AR can be automatically generated
AR represent patterns in the data without a specified target variable Good example of undirected data mining
Support of the rule: the percentage of all baskets that contain both product Y and Z
support = P(Y Z).
Confidence of the rule: the percentage of all the baskets containing Y that also
contain Z. Hence, confidence is a conditional probability, i.e. P(Z|Y) confidence = P(Y Z)/P(Y).
Interest of the rule: measures the statistical dependence of the rule, by relating the
observed frequency of occurrence (P(Y Z)) to the expected frequency of cooccurrence under the assumption of conditional independence of Y and Z (P(Y)*P(Z)) interest = P(Y Z)/(P(Y)*P(Z)). Association-rule discovery is the process of finding strong product associations with a minimum support and/or confidence and an interest of at least one.
7
A certainty measure for association rules of the form A => B, where A and B are sets of items, is confidence. Given a set of task
What % of customers have purchased Avg # orders/customer include it Avg quantity of it purchased/order
Transaction Data
Etc
11
Did the order use gift wrap? Billing address same as Shipping address? Did purchaser accept/decline a cross-sell? What is the most common item found on a one-item order? What is the most common item found on a multiitem order? What is the most common item for repeat customer purchases? How has ordering of an item changed over time? How does the ordering of an item vary geographically?
12
13
Association Rules
Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of also purchasing one of three types of candy bars [Forbes, Sept 8, 1997] Customers who purchase maintenance agreements are very likely to purchase large appliances When a new hardware store opens, one of the most commonly sold items is toilet bowl cleaners
14
Association Rules
Actionable Rules contain high-quality, actionable information Trivial Rules information already wellknown by those familiar with the business Inexplicable Rules no explanation and do not suggest action
POS Transactions
Co-occurrence of Products
Soda 2 1 0 Detergent 2 0 0
Soda
Detergent
2
2
1
0
0
0
3
1
1
2
16
Simple patterns: 1. Coke and soda are more likely purchased together than any other two items 2. Detergent is never purchased with milk or window cleaner 3. Milk is never purchased with soda or detergent
17
Items Purchased Coke, soda Milk, Coke, window cleaner Coke, detergent Coke, detergent, soda Window cleaner, soda
POS Transactions
If a customer purchases soda, then customer also purchases Coke 2 out of 3 soda purchases also include Coke, so 67% 2 out of 4 Coke purchases also include soda, so 50%
Confidence = Ratio of the number of transactions with all the items to the number of transactions with just the if items
18
Lift (improvementa) tells us how much better a rule is at predicting the result than just assuming the result in the first place
Lift is the ratio of the records that support the entire rule to the number that would be expected, assuming there was no relationship between the products Calculating liftp 310When lift > 1 then the rule is better at predicting the result than guessing When lift < 1, the rule is doing worse than informed guessing and using the Negative Rule produces a better rule than guessing Co-occurrence can occur in 3, 4, or more dimensions
19
Choosing the right set of items Generating rules by deciphering the counts in the co-occurrence matrix Overcoming the practical limits imposed by thousands or tens of thousands of unique items
2.
3.
20
Generate co-occurrence matrix for single itemsif Coke then soda Generate co-occurrence matrix for two itemsif Coke and Milk then
soda
3.
4.
Generate co-occurrence matrix for three itemsif Coke and Milk and Window Cleaner then soda Etc
21
How many combinations are there with 3 different menu items? 161,700 !
Use of product hierarchies (groupings) helps address this common issue Finally, know that the number of transactions in a given time-period could also be huge (hence expensive to analyze)
22
23
24
25
26
27
28
29
30
31
32
General Observations
Banking case seems to provide well defined and intelligible information of the form:
account_1
and account_2,,, etc or activity_1 and activity_2, etc, possibly indexed by time. As such, rules found provide guide to action to .offer. product or service (cross-sell).
33
In retailing case of items purchased together, .guidance. is not so clear cut due to extensive number of rules. Soccer event exemplifies sequencing of events towards reaching goal. Basketball-applied software has been developed years ago. Web mining shares the same principles, without passion usually associated with sports.
34
Challenges
A major difficulty is that a large number of the rules found may be trivial for anyone familiar with the business
The computational complexity involved in calculating the results of market basket analysis is at least the square of the number of transaction item-lines (records of every item purchased.) With data warehouses storing billions of transaction lines, this yields extremely high computational requirements
35
Solutions
Differential market basket analysis can find interesting results and can also eliminate the problem of a potentially high volume of trivial results Special techniques involving filtering or aggregation of the transaction database are commonly used to in analysis algorithms to increase performance and allow some level of interactivity, such as in business intelligence applications.
36
Thank You!
37