Assignment #1: Market Basket Analysis Group Details: 3 Shreyas Naik (15A3HP626) Parul Walia (15A3HP628)

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

ASSIGNMENT #1: MARKET BASKET ANALYSIS

GROUP DETAILS: 3

Shreyas Naik (15A3HP626)

Parul Walia (15A3HP628)

1
Statement of the problem:

To create a smart basket which has list of item which are frequently purchased
(recommendation) and suggest likely products missed based on their association with selected
items.

Description of the dataset:

 Each row in the dataset consists of a transaction.


 All transactions are for a single customer.
 In all there are 9835 transactions.
 There are two variables named as “ID” and Product”.
 The ID variable is a numeric variable capturing the transaction number starting from ‘0’
to ‘9834’.
 The Product variable is of string type which has items purchased in a particular
transaction.
 The maximum number of items purchased in a single transaction is 32. Therefore
product variable can be further elaborated as Product1, Product2, …., Product32.
 Average number of items selected per transaction by the customer is 4.409 with a
standard deviation of 3.589.
 The median is 3 items. In all, the customer has purchased 43367 items through 9835
transaction.
 Number of unique items purchased are 169.

2
Methodology Adopted:

Firstly, we need to convert the flat data into a stacked data set using the following syntax in
BASE SAS.

/**********************************************************************

libname grocery "C:\desktop\";

data grocery.transform;
set bigbasket;
array item (32) Product1-Product32;
do i=1 to 32;
Item_number=i;
if item(i)~="" then do;
Product=item(i);
keep ID Item_number Product;
output;
end;
end;
run;

The stacked data takes the form as below. (Example shown for four transactions)

ID Product1 Product2 Product3 Product4 ID Item_number Product


0 citrus fruit semi-finished bread
margarine ready soups 0 1 citrus fruit
1 tropical fruit yogurt coffee 0 2 semi-finished bread
2 whole milk 0 3 margarine
3 pip fruit yogurt cream cheese meat spreads 0 4 ready soups
1 1 tropical fruit
1 2 yogurt
1 3 coffee
2 1 whole milk
3 1 pip fruit
3 2 yogurt
3 3 cream cheese
3 4 meat spreads

3
This stacked data was used in E-Miner to form association with ID variable as ID and Product
variable as Target.

Analysis:

The frequency of the Product variable (TOP 30) and (BOTTOM 10) in the stacked data set is as
below.

Top 30 Frequency Percent (Items) Percent(Transaction)


Base: 43367 9835
whole milk 2513 5.8 25.6
other vegetables 1903 4.4 19.3
rolls/buns 1809 4.2 18.4
soda 1715 4.0 17.4
yogurt 1372 3.2 14.0
bottled water 1087 2.5 11.1
root vegetables 1072 2.5 10.9
tropical fruit 1032 2.4 10.5
shopping bags 969 2.2 9.9
sausage 924 2.1 9.4
pastry 875 2.0 8.9
citrus fruit 814 1.9 8.3
bottled beer 792 1.8 8.1
newspapers 785 1.8 8.0
canned beer 764 1.8 7.8
pip fruit 744 1.7 7.6
fruit/vegetable juice 711 1.6 7.2
whipped/sour cream 705 1.6 7.2
brown bread 638 1.5 6.5
domestic eggs 624 1.4 6.3
frankfurter 580 1.3 5.9
margarine 576 1.3 5.9
coffee 571 1.3 5.8
pork 567 1.3 5.8
butter 545 1.3 5.5
curd 524 1.2 5.3
beef 516 1.2 5.2
napkins 515 1.2 5.2
chocolate 488 1.1 5.0
frozen vegetables 473 1.1 4.8

4
Table 1. Frequency of top 30 items

Bottom 10 Frequency Percent


salad dressing 8 .0
whisky 8 .0
toilet cleaner 7 .0
baby cosmetics 6 .0
frozen chicken 6 .0
bags 4 .0
kitchen utensil 4 .0
preservation products 2 .0
baby food 1 .0
sound storage medium 1 .0

Table 2. Frequency of Bottom 10 items

 Looking at the frequency, the consumer is an avid user of whole milk.


 25% of transactions contain whole milk.
 The user consumes beer but not whisky. Milk products are regular items of purchase
such as yogurt, curd and butter.
 Customer is a non-vegetarian with beef and pork as favorite meat products. The person
is mostly male as ladies cosmetic items are seldom bought.
 Also the customer is not married as baby food is in the bottom 10 purchases.
 There is lot of consumption of fresh fruits and vegetables with focus on consuming on
less preservative products.
 The customer often buys shopping bag to carry the goods.

5
Association:

 7.58 transactions contain whole milk and other vegetables out of 9835 transactions
which is 7.58%.
 The 29.29% confidence defines that out of 100 times the customer has bought whole
milk, 29.29 times other vegetable was also bought.
 The chances of person buying whole milk increases once the customer buys other
vegetables (other vegetables ==> whole milk , confidence = 38%)
 In three product association, person buys whole milk with more probability once yogurt
and other vegetables is bought.

Support Confidence( Transaction


Relations Lift Rule
(%) %) Count
2 1.51 7.48 29.29 736 whole milk ==> other vegetables
2 1.51 7.48 38.68 736 other vegetables ==> whole milk
2 1.21 5.66 30.79 557 rolls/buns ==> whole milk
2 1.57 5.6 40.16 551 yogurt ==> whole milk
2 1.76 4.89 44.87 481 root vegetables ==> whole milk
2 2.25 4.74 43.47 466 root vegetables ==> other vegetables
2 1.58 4.23 40.31 416 tropical fruit ==> whole milk
2 1.76 3.22 44.96 317 whipped/sour cream ==> whole milk
2 1.85 3 47.28 295 domestic eggs ==> whole milk
2 2.08 2.89 40.28 284 whipped/sour cream ==> other vegetables
2 1.95 2.76 49.72 271 butter ==> whole milk
2 1.92 2.61 49.05 257 curd ==> whole milk
2 1.62 2.42 41.32 238 margarine ==> whole milk
2 1.59 2.13 40.5 209 beef ==> whole milk
2 1.66 2.04 42.49 201 frozen vegetables ==> whole milk
3 2.45 2.32 47.4 228 whole milk & root vegetables ==> other vegetables
3 1.91 2.32 48.93 228 root vegetables & other vegetables ==> whole milk
3 2.01 2.23 51.29 219 yogurt & other vegetables ==> whole milk

Table 3. Association (confidence >= 40% and support >= 2% or support > 5 and confidence >= 25%)

6
Recommendation:

 The recommendation list should contain top 30 items purchased as shown in the
frequency above.
 Milk products, vegetables, beer can be placed together for easy navigation.
 The ‘did you forget’ list should contain below goods based on association rule
conducted.
 If whole milk bought but did not buy other vegetables(s=7.48%, c=29.29%) and vice
versa is also true.
 Whole milk should also be added to the cart of did you forget if selected yogurt, root
vegetables, tropical fruit, sour cream, domestic eggs, curd and margarine. (confidence
>= 40% and support >= 2% or support >= 5 and confidence >= 25%)
 Similarly create a threshold (confidence >= 40% and support >= 2% OR support >= 5
and confidence >= 25%) and add all the R.H.S products if not selected when L.H.S
products are selected.

You might also like