Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

MRA Project ML 1

Abhishek Kapoor
DSBA AUG A20
Agenda [Table OF CONTENT]
• Executive Summary of the data
• EDA and Inference
• Customer Segmentation using RFM analysis
• Inferences from RFM Analysis and identified segments
PROBLEM STATEMENT

An automobile parts manufacturing company has collected data of


transactions for 3 years. They do not have any in-house data science
team, thus they have hired you as their consultant. Your job is to use your
magical data science skills to provide them with suitable insights about
their data and their customers.

Auto Sales Data: Sales_Data.xlsx


About data
• The data provided is from a Auto parts manufacturing company
• The data has 20 variables and 2747 records.
• The company currently has 89 Customers from 19 countries across the
globe
• They are dealing 109 products in 7 different product lines
• The data is clean and has neither null values nor duplicate records
Executive SUMMARY
• This analysis is being carried out in order to gain insights about the
company’s performance over the years
• Prospects of improvement
• Customer analysis, and RFM based segmentation
• Identifying patterns and other analytical inferences to drive business
solutions
EDA (Info, general summary)
• Shape of data: (2747, 20)
• Continuous variables: 7
• Categorical variables: 12
• Date-Time variables: 1
• Null values: 0
• Duplicate records: 0
PYTHON
• EDA for the dataset is performed using Python
• Libraries used:
• Pandas
• Numpy
• Seaborn
• Matplotlib
Outlier detection
1. The data shows varied range of entries
across each variables
2. More number of outliers is observed in Sales
variable
UNIVARIATE ANALYSIS
(HISTOGRAM)
BIVARIATE ANALYSIS
(BASED ON SALES DATA)

STATUS PRODUCTLINE
BIVARIATE ANALYSIS
(PRODUCTLINE)
COUNTRY vs SALES
PRODUCT LINES vs SALES

1. All 7 product lines shows more of less


similar sales contribution
2. The also show equal distribution in
terms of deal size
3. Ships product line alone don’t seem to
have any large size business
requirements.
4. Further analysis should be done on the
same to gain insights
COUNTRY-WISE SALES

DISTRIBUTION SALES
COUNTRY-WISE ORDERS

DISTRIBUTION ORDERS
SUMMARY

1. This scatter plot represents the


distribution of various product lines
with respect to Quantity ordered and
Sales price
2. Classic cars are visibly the most
selling product line
3. Followed by Vintage cars
4. Further analysis should be done to
analyze patters
SUMMARY:

1. USA is the most popular


market for the Auto
manufacturing company
2. Followed by Spain and
France
3. Ireland and Philippines are
the least consuming
countries for the MNC
SALES OVER TIME
Summary
• USA is the primary market
• Classic cars is the most selling product line among the 7 different
product lines
• There is a recurring pattern observed in sales, November month is
when the highest sales is happening over the past three years
• Euro shopping channel is the most loyal customer
SALES vs MSRP value

SUMMARY:

1. This graph shows a comparison


between Average MSRP value
and Average Sales across all
product lines
2. MSRP value for each order are
exceeding multiple folds more
than Selling price of that order
PAIR PLOT
1. The pair plot shows the data in
all variables are normally
distributed
2. There’s visible correlation
between sales, MSRP and
price each.
3. Other variables don’t show any
signs of correlation
HEATMAP

1. The heatmap provides


affirmation to our
observation in pair plot
2. Sales variables is highly
correlating with MSRP
and Price each
3. There is also a partial
correlation observed
between Quantity of order
and Sales
SALES ACROSS COUNTRIES

1. The company is having a


viable market in 19 countries
2. This graph shows the sales
across different countries
3. USA is the primary market of
the company contributing
maximum to its turn-over
4. Some European countries like
Spain and Switzerland shows
good amount of sales
following USA
Eda SUMMARY [INFERENCES]
1. USA, Spain and France are the main market for the company
2. Most gold customers are present in USA market
3. Euro shopping channel present in Spain is the most loyal customer
4. Followed by Mini Gifts Distributor in USA
5. Ireland, Phillipines and Belgium are performing very low in terms of
sales
EDA Summay (Continued)
• 6. Rovelli Gifts, one of the high sale customer hasn’t made any
purchases recently. Take actions to retain them.
• 7. Switzerland only deals in Classic cars product line.
• 8. Classic cars product line is the highest selling.
• 9. Trains product line is the least selling
RFM analysis using KNIME
RFM
• RFM analysis is performed based on 3 variables:

• Recency: Datediff(max(last_orderdate) – last_orderdate)


• Frequency: count(ORDERNUMBER)
• Monetary: sum(SALES)
• 4 Bins each are created based on E, H, M and L quartiles of the mentioned
variables
• Further they are categorized based on RFM scoring.
3 Bin RFM segmentation
RECENCY
FREQ MON TOTAL
H M L

H H 11 7 2 20
ACTIVE H M 0 2 0 2

H L 0 0 0 0

M H 1 1 0 2
AT RISK M M 9 19 8 36

M L 0 1 1 2

L H 0 0 0 0
INACTIVE L M 1 4 1 6

L L 1 10 10 21

TOTAL 23 44 22 89
4 SEGEMENT RFM ANALYSIS
• The RFM values calculated are segmented in 4 different variables
including Platinum, Gold, Silver and Bronze
• parameters and assumptions:
• It is assumed that the maximum last order date is the present date and Recency
is calculated based on that.
• Monetary is the sum of Sales
• Frequency is the count of order transaction IDs.
Inferences from RFM Analysis and
identified segments
4 BIN RFM SEGMENTATION

BADGE COUNT
PLATINUM 17
GOLD 15
SILVER 29
BRONZE 28
89
SAMPLE OF RFM CUSTOMER SEGMENTED DATA
NATURE OF SEGMENTATION

1. The graph shows the distribution of Average


Recency, Frequency and Monetary metrics
for the 4 segmentations created
2. Platinum customers are the ones who
made a recent purchase, been more
frequent and has driven more monetization
for the company
3. Gold and Silver consists of less recent
customers, with medium monetization and
frequency compared to platinum
4. Bronze customers are the ones who are
churned or in the verge of churning and
haven’t driven much monetization to the
company
Who are your best customers?
(Based on RFM scores)

FEW OF THE PLATINUM CUSTOMERS


1. Euro Shopping Channel
2. Mini Gifts Distributors Ltd.
3. Australian Collectors, Co.
4. Muscle Machine Inc
5. Dragon Souveniers, Ltd.
TOP PERFORMERS (GOLD)
Which customers are on the verge of
churning? (any 5 customers)
1. Land of Toys Inc.
2. AV Stores, Co.
3. Rovelli Gifts
4. Online Diecast Creations Co.
5. Corrida Auto Replicas, Ltd
Who are your lost customers? (give at least 5)

1. CAF Imports
2. West Coast Collectables Co.
3. Cambridge Collectables Co.
4. Double Decker Gift Stores, Ltd
5. Bavarian Collectables Imports, Co.
Who are your loyal customers? (give at
least 5)
High Recency and Frequency:
1. Euro Shopping Channel
2. Mini Gifts Distributors Ltd.
3. La Rochelle Gifts
4. The Sharp Gifts Warehouse
5. Souveniers And Things Co.
PREDICTIONS (Month, Sale)

• June 2020 373 K • December 2020 435 K


• July 2020 401 K • January 2021 449 K
• August 2020 470 K • February 2021 462 K
• September 2020 431 K • March 2021 414 K
• October 2020 636 K • April 2021 435 K
• November 2020 1.18 M • May 2021 461 K
• June 2021 453 K
Annual Turnover

Actual: Turnover percentage calculation for 2021:


2018: 3.35 M
2020 Turnover till June: 1.74 M
2019: 4.67 M 28.26% Predicted turnover 2021 till June : 2.67 M

Predictions: Percentage increase = (2.67 – 1.74)/2.67


= 34.83%
2020: 5.66 M 17.49%
2021 (Till June): 2.67 M 34.83%
FORECAST inference
• The Time series forecast is performed based on the sales data
obtained over the past 3 years
• It shows the highest grossing turnover of 1.18 M in the month of Nov
2020, which is the highest in the past 3 years.
• Rest of the months lay fairly between 470 K – 500 K
BUSINESS RECOMMENDATION
1. Focus on retaining Platinum and Gold customers
2. Initiate Loyalty programs and campaigns to customer relationship
3. In addition to US market, invest and do more business in Spain, France and
Australia. As they give promising sales performance.
4. In order to venture into new markets, Classic cars and Vintage cars product
line will be most effective.
5. The model has predicted a good inflation of 34.83% in the 2021 market on
gross sales. Hence, take necessary measure to handle production and sales
accordingly
LINKS AND REFERENCE
• Tableau: MRA project Tableau visulatization
• RFM segmentation: RFM Analysis new.xlsx
• Python notebook: MRA project.ipynb
• KNIME:MRS Knime.svg
THANK YOU

You might also like