Professional Documents
Culture Documents
ML Assignment 02 - Aqsa Mushtaq
ML Assignment 02 - Aqsa Mushtaq
# importing module
import pandas as pd
# dataset
data = pd.read_csv("/content/drive/MyDrive/Market_Basket_Optimisation.csv")
# printing the shape of the dataset
data.shape
(7500, 20)
whole
vegetables green cottage energy tomato
shrimp almonds avocado weat yams
mix grapes cheese drink juice
flour y
0 burgers meatballs eggs NaN NaN NaN NaN NaN NaN NaN
1 chutney NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 turkey avocado NaN NaN NaN NaN NaN NaN NaN NaN
low fat
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN
yogurt
# importing module
import numpy as np
# Gather All Items of Each Transactions into Numpy Array
transaction = []
for i in range(0, data.shape[0]):
for j in range(0, data.shape[1]):
transaction.append(data.values[i,j])
# converting to numpy array
transaction = np.array(transaction)
# Transform Them a Pandas DataFrame
df = pd.DataFrame(transaction, columns=["items"])
# Put 1 to Each Item For Making Countable Table, to be able to perform Group By
df["incident_count"] = 1
# Delete NaN Items from Dataset
indexNames = df[df['items'] == "nan" ].index
df.drop(indexNames , inplace=True)
# Making a New Appropriate Pandas DataFrame for Visualizations
df_table = df.groupby("items").sum().sort_values("incident_count", ascending=False).reset_index()
# Initial Visualizations
df_table.head(10).style.background_gradient(cmap='Greens')
items incident_count
1 eggs 1348
2 spaghetti 1306
4 chocolate 1230
6 milk 972
9 pancakes 713
`should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and
labels=all
incident_count
parent=
all
color
id=all
mineral water spaghetti green tea frozen vegetables cookies escalope low fat yogurt shrimp items=(?)
color=877.888
1600
1400
pancakes
tomatoes turkey chicken whole wheat rice
french fries
milk 1200
400
& a b c d e f g h ... p r
0 False False False True False False True False True False ... False True
1 False False True True False False True False False False ... False False
2 False False False False False False True False True False ... False False
3 False False True False False False False False False False ... False False
4 False False True False False False False False False False ... False False
... ... ... ... ... ... ... ... ... ... ... ... ... ...
149995 False False True False False False False False False False ... False False
149996 False False True False False False False False False False ... False False
149997 False False True False False False False False False False ... False False
149998 False False True False False False False False False False ... False False
149999 False False True False False False False False False False ... False False
https://colab.research.google.com/drive/1b2mSuZT01tjOr-aD6thEBhRUTWouswj5#scrollTo=AaLUrF-sMHhi&printMode=true 2/5
11/29/23, 11:21 PM ML Assignment 02 - Colaboratory
# select top 50 items
first50 = df_table["items"].head(50).values
# Extract Top50
dataset = dataset.loc[:50]
# shape of the dataset
dataset.shape
(51, 27)
https://colab.research.google.com/drive/1b2mSuZT01tjOr-aD6thEBhRUTWouswj5#scrollTo=AaLUrF-sMHhi&printMode=true 3/5
11/29/23, 11:21 PM ML Assignment 02 - Colaboratory
57 0.058824 (e, u) 2
Like what you see? Visit the data table notebook to learn more about interactive tables.
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:
`should_run_async`
# printing the frequntly will not
items call
with `transform_cell`
length 3 automatically in the future. Please pass the result to `transformed_cell` argument and
frequent_itemsets[ (frequent_itemsets['length'] == 3) ].head(3)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:
`should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and
`should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and
Distributions
support itemsets length
88 0.019608 (e, b, a) 3
89 0.019608 (l, b, a) 3
90 0.019608 (m, b, a) 3
2-d distributions
# We set our metric as "Lift" to define whether antecedents & consequents are dependent our not
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
rules["antecedents_length"] = rules["antecedents"].apply(lambda x: len(x))
rules["consequents_length"] = rules["consequents"].apply(lambda x: len(x))
rules.sort_values("lift",ascending=False)
Time series
Values
https://colab.research.google.com/drive/1b2mSuZT01tjOr-aD6thEBhRUTWouswj5#scrollTo=AaLUrF-sMHhi&printMode=true 4/5
11/29/23, 11:21 PM ML Assignment 02 - Colaboratory
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:
`should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and
antecedent consequent
antecedents consequents support confidence lift leverage conviction zhangs_metric antecedents_
support support
2445 (m, a) (e, l, t) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000
2150 (e, l, a) (t, b) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000
3886 (s, m, e) (t, b, a) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000
# Sort values based on confidence
rules.sort_values("confidence",ascending=False)
3885 (s, a, e) (t, m, b) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000
3884 (s, t, e) (m, b, a) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:
... ... ... ... ... ... ... ... ... ... ...
`should_run_async`
539 (s, e) will not call
(t) `transform_cell`
0.058824 automatically
0.058824 0.019608 in the future. 5.666667
0.333333 Please pass the result
0.016148 to `transformed_cell`
1.411765 0.875000 argument and
2693 (s, l, t) (e, b) 0.019608 0.039216 0.019608 1.0 25.5 0.018839 inf 0.98
2694 (e, l, t) (s, b) 0.019608 0.039216 0.019608 1.0 25.5 0.018839 inf 0.98
... ... ... ... ... ... ... ... ... ... ...
763 (e) (t, b, a) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00
2859 (e) (n, t, c, h) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00
739 (e) (m, b, a) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00
727 (e) (l, b, a) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00
1611 (e) (n, t, h) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00
https://colab.research.google.com/drive/1b2mSuZT01tjOr-aD6thEBhRUTWouswj5#scrollTo=AaLUrF-sMHhi&printMode=true 5/5