Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Case Study Report: Amazon Sales Report Analysis

Introduction

This case study aims to analyze the Amazon sales data to uncover insights into customer
demographics, product preferences, and sales performance. The analysis focuses on identifying
key trends and patterns that can inform business strategies and decision-making.

Data Overview

The data consists of various features related to Amazon sales, such as order ID, order date, ship
date, ship mode, customer ID, segment, country, city, state, postal code, region, product ID,
category, sub-category, product name, sales, quantity, discount, and profit.

Objectives

1. Identify the states with the highest number of sales.


2. Analyze the distribution of sales across different states.
3. Determine the most preferred product category and sub-category.
4. Understand customer segmentation and their purchasing behavior.
5. Explore shipping modes and their impact on sales and profit.

Methodology

The analysis involves the following steps:

1. Data Loading and Preparation:


o Importing necessary libraries (pandas, seaborn, matplotlib).
o Uploading the CSV file and loading it into a pandas DataFrame.
2. Exploratory Data Analysis (EDA):
o Descriptive statistics to understand the data distribution.
o Visualization to identify patterns and trends.
o Analysis of top states by sales volume.
3. Visualization and Insights:
o Plotting the distribution of states with the highest sales.
o Highlighting key observations and trends.
Code Implementation

Data Loading and Preparation:

FROM GOOGLE.COLAB IMPORT FILES


IMPORT PANDAS AS PD
IMPORT SEABORN AS SNS
IMPORT MATPLOTLIB.PYPLOT AS PLT

# UPLOAD THE FILE


UPLOADED = FILES.UPLOAD()

# LOAD THE CSV FILE INTO A DATAFRAME


FILE_NAME = LIST(UPLOADED.KEYS())[0]
DF = PD.READ_CSV(FILE_NAME)

# DISPLAY THE FIRST FEW ROWS OF THE


DATAFRAME
DF.HEAD()

Exploratory Data Analysis:

# Descriptive statistics

df.describe()

# Top 10 states by sales volume

top_10_states = df['ship-
state'].value_counts().head(10)
Visualization:

# Plot count of cities by state

plt.figure(figsize=(12, 6))

sns.countplot(data=df[df['ship-
state'].isin(top_10_states.index)], x='ship-state')

plt.xlabel('ship-state')

plt.ylabel('count')

plt.title('Distribution of State')

plt.xticks(rotation=45)

plt.show()

Insights

State-wise Distribution of Sales

• Top 10 States: The analysis identifies the top 10 states with the highest sales volume. A bar plot
is used to visualize the distribution, showing that Maharashtra has the highest number of buyers.

Customer Segmentation and Preferences

• Customer Base: The data reveals a significant customer base in the Maharashtra state.
• Product Preferences: T-shirts are highly demanded, with M-size being the most preferred choice
among buyers.
• Order Fulfillment: Orders are primarily fulfilled through Amazon, highlighting its role as a crucial
distribution channel.

Conclusion

The data analysis reveals that the business has a significant customer base in Maharashtra state,
mainly serves retailers, fulfills orders through Amazon, experiences high demand for T-shirts, and
sees M-Size as the preferred choice among buyers. These insights can help the business tailor its
marketing strategies, optimize inventory management, and enhance customer satisfaction.
Recommendations

We can also incorporate linear regression algorithm for the above sales report.To use a linear
regression algorithm for an Amazon sales report, you'll follow these general steps:

1. Collect Data:
o Gather historical sales data. This can include daily, weekly, or monthly sales
figures, depending on the granularity you need.
o Collect other relevant features that might influence sales, such as price,
advertising spend, promotions, seasonality, and competitor prices.
2. Preprocess Data:
o Clean the data by handling missing values, removing outliers, and ensuring
consistency.
o Encode categorical variables if necessary (e.g., product categories).
o Normalize or standardize numerical features to improve the performance of the
regression model.
3. Exploratory Data Analysis (EDA):
o Visualize the data to understand trends, patterns, and relationships between
variables.
o Use plots like histograms, scatter plots, and correlation matrices.
4. Split the Data:
o Divide the data into training and testing sets. A common split is 80% for training
and 20% for testing.
5. Build the Linear Regression Model:
o Use a machine learning library like scikit-learn in Python to create and train the
linear regression model.
6. Evaluate the Model:
o Assess the model's performance using metrics like Mean Absolute Error (MAE),
Mean Squared Error (MSE), and R-squared.
o Plot residuals to check for patterns that might indicate issues with the model.
7. Make Predictions:
o Use the trained model to make predictions on new data or to forecast future sales.

Output:

The results of the Amazon sales report analysis is shown below. In this out put we get to
whichstate purchased more products in India

You might also like