Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

AMAZON SALES

DATASET
Name – SAMYAK KHANDERAO
Roll No – A-46
PRN – 22610045
INTRO TO THE DATASET
 This dataset is having the data of 1K+ Amazon Product's Ratings and
Reviews as per their details listed on the official website of Amazon

• FOLLOWING OPERATION CAN BE PERFORMED ON THE DATASET:


• Dataset Walkthrough
• Understanding Dataset Hierarchy
• Data Preprocessing
• Exploratory Data Analysis
• Data Visualization

• THIS DATA SET INCLUDES FEATURES LIKE


PRODUCT_ID,PRODUCT_NAME,USER_ID,USER_NAME,ACTUAL_COST,DISCO
UNTED_COST …….
IMPORTING LIBRARIES
• NumPy :- NumPy is commonly used
for numerical computations and
array manipulations.

• Pandas :- Pandas is widely used for


data cleaning, transformation, and analysis tasks.

• Matplotlib.pyplot :- It offers a wide range of customization options for


creating publication-quality figures. It is also for labeling the graphs.
• Seaborn :- Seaborn simplifies the process of creating complex visualizations
such as scatter plots, histograms, and heatmaps.
BASIC EDA OPERATIONS
• DATAFRAME.INFO() • DATAFRAME.DESCRIBE()
Continued…
• Dataframe.isnull().sum()
• Dataframe.describe(include='object')
Historgarm (univariate)
• Univariate Method:-

• Histogram:
Displays the distribution of a single
variable by dividing its range into
intervals (bins) and plotting the
frequency or count of observations
within each bin.
Visualize scatter plot in your dataset with maximum
no of parameters
This data is only for first 100 rows in
the data

Rating column helps in hue/color


adjustment

sns.scatterplot(data=df.head(100),
x='actual_price',
y='discounted_price',
hue='rating', size=5)
Perform Bivariate Graphical EDA on given dataset.
• Grouping the DataFrame by category and
calculate the total number of users for
each category

• category_user_counts =
df.groupby('category')
['number_of_users'].sum().reset_index()

• sns.barplot(data=category_user_counts,
x='category', y='number_of_users')
Scatterplot (bivariate)
This scatterplot shows how the
actual price of the product
varies with the discount
percentage

We can see that there are


greater discount
Rates on the product prices
below 5000 rs
skewness
Skewness: Skewness is a statistical 1]Skewness = 0: Then normally distributed.
2]Skewness > 0: Then more weight in the left
measure that describes the asymmetry of tail of the distribution.
the distribution of a dataset. 3]Skewness < 0: Then more weight in the right
tail of the distribution.
Thank You

You might also like