Professional Documents
Culture Documents
Assign 4-Samana Tatheer 20U00323 .Ipynb - Colaboratory
Assign 4-Samana Tatheer 20U00323 .Ipynb - Colaboratory
ipynb - Colaboratory
import pandas as pd
df323 = pd.read_csv("/content/train.csv")
df323
0-
0 1000001 P00069042 F 10 A
17
0-
1 1000001 P00248942 F 10 A
17
0-
2 1000001 P00087842 F 10 A
17
0-
3 1000001 P00085442 F 10 A
17
51-
550063 1006033 P00372445 M 13 B
55
26-
550064 1006035 P00375436 F 1 C
35
26-
550065 1006036 P00375436 F 15 B
35
46-
550067 1006039 P00371644 F 0 B
50
Q2. Data Profiling:tells everything about the data set-the number of variables in a dataset
Collecting ydata_profiling
Downloading ydata_profiling-4.5.1-py2.py3-none-any.whl (357 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 357.3/357.3 kB 5.1 MB/s eta 0:00:00
Requirement already satisfied: scipy<1.12,>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from ydata_profili
Requirement already satisfied: pandas!=1.4.0,<2.1,>1.1 in /usr/local/lib/python3.10/dist-packages (from ydata_pr
Requirement already satisfied: matplotlib<4,>=3.2 in /usr/local/lib/python3.10/dist-packages (from ydata_profili
Collecting pydantic<2,>=1.8.1 (from ydata_profiling)
Downloading pydantic-1.10.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 43.3 MB/s eta 0:00:00
Requirement already satisfied: PyYAML<6.1,>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from ydata_profili
Requirement already satisfied: jinja2<3.2,>=2.11.1 in /usr/local/lib/python3.10/dist-packages (from ydata_profil
Collecting visions[type_image_path]==0.7.5 (from ydata_profiling)
Downloading visions-0.7.5-py3-none-any.whl (102 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 102.7/102.7 kB 10.4 MB/s eta 0:00:00
Requirement already satisfied: numpy<1.24,>=1.16.0 in /usr/local/lib/python3.10/dist-packages (from ydata_profil
Collecting htmlmin==0.1.12 (from ydata_profiling)
Downloading htmlmin-0.1.12.tar.gz (19 kB)
Preparing metadata (setup.py) ... done
https://colab.research.google.com/drive/146Saq0pZI_Bj-nEicmWsnkoA83FDynR_#scrollTo=konJxjGp1Al5&printMode=true 1/6
9/7/23, 1:49 PM Assign 4-Samana Tatheer 20U00323 .ipynb - Colaboratory
Collecting phik<0.13,>=0.11.1 (from ydata_profiling)
Downloading phik-0.12.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (679 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 679.5/679.5 kB 50.7 MB/s eta 0:00:00
Requirement already satisfied: requests<3,>=2.24.0 in /usr/local/lib/python3.10/dist-packages (from ydata_profil
Requirement already satisfied: tqdm<5,>=4.48.2 in /usr/local/lib/python3.10/dist-packages (from ydata_profiling)
Requirement already satisfied: seaborn<0.13,>=0.10.1 in /usr/local/lib/python3.10/dist-packages (from ydata_prof
Collecting multimethod<2,>=1.4 (from ydata_profiling)
Downloading multimethod-1.9.1-py3-none-any.whl (10 kB)
Requirement already satisfied: statsmodels<1,>=0.13.2 in /usr/local/lib/python3.10/dist-packages (from ydata_pro
Collecting typeguard<3,>=2.13.2 (from ydata_profiling)
Downloading typeguard-2.13.3-py3-none-any.whl (17 kB)
Collecting imagehash==4.3.1 (from ydata_profiling)
Downloading ImageHash-4.3.1-py2.py3-none-any.whl (296 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 296.5/296.5 kB 25.4 MB/s eta 0:00:00
Requirement already satisfied: wordcloud>=1.9.1 in /usr/local/lib/python3.10/dist-packages (from ydata_profiling
Collecting dacite>=1.8 (from ydata_profiling)
Downloading dacite-1.8.1-py3-none-any.whl (14 kB)
Requirement already satisfied: PyWavelets in /usr/local/lib/python3.10/dist-packages (from imagehash==4.3.1->yda
Requirement already satisfied: pillow in /usr/local/lib/python3.10/dist-packages (from imagehash==4.3.1->ydata_p
Requirement already satisfied: attrs>=19.3.0 in /usr/local/lib/python3.10/dist-packages (from visions[type_image
Requirement already satisfied: networkx>=2.4 in /usr/local/lib/python3.10/dist-packages (from visions[type_image
Collecting tangled-up-in-unicode>=0.0.4 (from visions[type_image_path]==0.7.5->ydata_profiling)
Downloading tangled_up_in_unicode-0.2.0-py3-none-any.whl (4.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.7/4.7 MB 85.2 MB/s eta 0:00:00
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2<3.2,>=2.1
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4,>=
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4,>=3.2-
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4,>
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4,>
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4,>=3
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4,>=
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib<
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas!=1.4.0,<2.1,
Requirement already satisfied: joblib>=0.14.1 in /usr/local/lib/python3.10/dist-packages (from phik<0.13,>=0.11.
Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from pydanti
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from request
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.24.0
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=
https://colab.research.google.com/drive/146Saq0pZI_Bj-nEicmWsnkoA83FDynR_#scrollTo=konJxjGp1Al5&printMode=true 2/6
9/7/23, 1:49 PM Assign 4-Samana Tatheer 20U00323 .ipynb - Colaboratory
100% Completed]
100% 9.52s/it]
Overview
Dataset statistics
Number of variables 12
Q3. ConvertAlerts
the variables in categorical and numerical data type
Product_Category_2 is highly overall correlated with High correlation
Making an Product_Category_3
array
df323[cols] = df323[cols].astype("category")
df323["Purchase"]=df323["Purchase"].astype(float)
https://colab.research.google.com/drive/146Saq0pZI_Bj-nEicmWsnkoA83FDynR_#scrollTo=konJxjGp1Al5&printMode=true 3/6
9/7/23, 1:49 PM Assign 4-Samana Tatheer 20U00323 .ipynb - Colaboratory
import numpy as np
Q1,Q3=np.percentile(df323["Purchase"],[25,75])
IRQ=Q3-Q1
IRQ
6231.0
upper=np.where(df323["Purchase"]>(Q3+1.5*IRQ))
lower=np.where(df323["Purchase"]>(Q1-1.5*IRQ))
df323["Purchase"]=df323["Purchase"].replace(upper[0],np.nan)
df323.isnull().sum
https://colab.research.google.com/drive/146Saq0pZI_Bj-nEicmWsnkoA83FDynR_#scrollTo=konJxjGp1Al5&printMode=true 4/6
9/7/23, 1:49 PM Assign 4-Samana Tatheer 20U00323 .ipynb - Colaboratory
df323b = df323.drop(["User_ID","Product_ID"], axis=1)
df323b.shape
(550068, 10)
df323b["Purchase"]=df323b["Purchase"].fillna(df323b["Purchase"].median())
df323b["Product_Category_2"]=df323b["Product_Category_2"].fillna(df323b["Product_Category_2"].mode()[0])
df323b["Product_Category_3"]=df323b["Product_Category_3"].fillna(df323b["Product_Category_3"].mode()[0])
df323b.isnull().sum()
Gender 0
Age 0
Occupation 0
City_Category 0
Stay_In_Current_City_Years 0
Marital_Status 0
Product_Category_1 0
Product_Category_2 0
Product_Category_3 0
Purchase 0
dtype: int64
df323b.describe(include='all')
unique 2 7 21.0 3 5
https://colab.research.google.com/drive/146Saq0pZI_Bj-nEicmWsnkoA83FDynR_#scrollTo=konJxjGp1Al5&printMode=true 5/6
9/7/23, 1:49 PM Assign 4-Samana Tatheer 20U00323 .ipynb - Colaboratory
https://colab.research.google.com/drive/146Saq0pZI_Bj-nEicmWsnkoA83FDynR_#scrollTo=konJxjGp1Al5&printMode=true 6/6