Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

MRA

Project -
Milestone 1
MAMINUL ISLAM
ISLAMMAMINUL44@GMAIL.COM
Summary
➢ Agenda & Executive Summary of the
data.

➢ Exploratory Analysis and Inferences.

➢ Customer Segmentation using RFM


analysis.

➢ Inferences from RFM Analysis and


identified segments.
Problem Statement:

An automobile parts manufacturing company has collected data of transactions for 3 years.
They do not have any in-house data science team, thus they have hired you as their consultant.
Your job is to use your magical data science skills to provide them with suitable insights about
their data and their customers.

Auto Sales Data: Sales_Data.xlsx


Data Dictionary:
ORDERNUMBER : Order Number CUSTOMERNAME : customer
QUANTITYORDERED : Quantity ordered PHONE : Phone of the customer
PRICEEACH : Price of Each item ADDRESSLINE1 : Address of customer
ORDERLINENUMBER : order line CITY : City of customer
SALES : Sales amount POSTALCODE : Postal Code of customer
ORDERDATE : Order Date COUNTRY : Country customer
DAYS_SINCE_LASTORDER : Days_ Since_Lastorder CONTACTLASTNAME : Contact person customer
STATUS : Status of order like Shipped or not CONTACTFIRSTNAME : Contact person customer
PRODUCTLINE : Product line – CATEGORY DEALSIZE : Size of the deal based on Quantity and Item Price
MSRP : Manufacturer's Suggested Retail Price
PRODUCTCODE : Code of Product
➢ Upload and explore the data in tableau.

➢ Shape of data set: 2747 Rows, 19 columns.


➢ Number of variables: 6 – Numeric Variables, 1 – datetime, 12 – Categorical Variables.
➢ Zero ‘0’ Null Values in the data.
➢ Adding a new column Monetary to calculate the price = Quantity * Price.
The EDA analysis is done in Tableau tool – with the
workflow published in Tableau public:
Exploratory Analysis and Inferences tableau Public Link
EDA – Univariant Analysis

Sales Monetary
Univariate Univariate

The numeric variable Sales & Monetary are nearly same slightly right skewed bell curve –
yet its not the same values. So, we’ll follow all analysis using calculated price per order
which is Monetary.
EDA – Univariant Analysis

Quantity Ordered Priceeach


Univariate Univariate

The Price-each is matching mostly to sales and quantity ordered is very high or very low
values.
EDA – Univariant Analysis

Days Since Last MSRP


order Univariate

The days since last ordered represents frequency of when the customer places next order
as can be seen most of the customers have a range of 800-2800 days of last orders.
EDA – Bivariant Analysis

Country & Orders


Bivariant

The days since last ordered represents


frequency of when the customer places
next order as can be seen most of the
customers have a range of 800-2800 days
of last orders.
EDA – Bivariant Analysis
EDA – Trend of the Sales

Sales trend per


different time period:

➢ Sales in Yearly trend is


decreasing in 2020 from
2018 with 2019 having
Highest sales.

➢ Quarterly & Monthly


Sales having increasing
trend with seasonality –
indicating sales increases
in Q4 and then decreased.
EDA – Day Trends of the Sales

Day Trends

Day Trends of Sales, Priceeach & Quantity Ordered by the respective customers with
the average MSRP.
EDA – Weekly Trend of the Sales & MSRP

Weekly Sales

Weekly sales & MSRP shown with the volume pick and lows.
EDA – Sales across different
Productlines

Sales across different


Productlines

Classic cars is the product lines which is


having highest sales followed by Vintage
cars and trains are having the lowest sales
numbers in the product line.
EDA – Orders status & Sales

Orders status & Sales

Most of the sales revenues is from


shipped orders – followed by
cancelled orders and on-hold orders.
This needs to be checked by
Company and even Disputed cases are
also in good numbers.
EDA – Sales/Customers

Sales per customers in decreasing order of sales:


EDA – Sales & Order Status/Country

USA is having most orders shipped and high sales with 3 on hold and 1 cancelled. This is
followed by France and so on order of decreasing sales.
EDA – Sales & Order status across
customers

Euro Shopping channel is the customer having highest sales and orders shipped followed
by Mini Gifts Distributers ltd. Euro Shopping Channel is also having orders in cancelled,
Disputed status – again an inference for company to check.
RFM Segmentation

➢ R-Score : Recency, is the most recent customer order which is calculated taking
difference of Order date & current date in Days.

RECENCY in Days = ORDERDATE – Current Date

➢ F-Score : Frequency, is the how often the orders are placed by customers, from the
excel sheet the variable DAYS_SINCE_LASTORDER.

➢ M-Score : Monetory, Sales can be used as Monetory but in this project used the
calculation of price & Quantity:

Monetory = QUANTITYORDERED * PRICEEACH


RFM Segmentation

➢ Using KNIME for generating RFM figures & Bins accordingly.

➢ Created 3 Bins each for R, F & M with below distribution of values:


Percentage of Values Captured
Recency Frequency Monetary
if in ordered of ascending
Bin 1 0 - 0.25 H L L
Bin 2 0.25 - 0.70 M M M
Bin 3 0.75 - 1.00 L H H

➢ The final output will have addition values:


Recency, Monetory, Frequency,
R - Score (Values – L, M, H),
M - Score (Values - L, M, H),
F - Score (Values - L, M, H)
RFM Segmentation

The RFM Segmentation is done in Tableau tool – with the workflow published in Tableau
public:

RFM Analysis tableau Public Link


RFM Analysis

The output excel file from Tableau is used in excel to get actual RFM scores on sales:
ORDERN QUANTI PRICEEA ORDERLI SALES ORDERDATE DAYS_SINCE_LASTOR STATUS PRODU MSRP PRODU CUSTO PHONE ADDRES CITY POSTALCO COUNTR CONTA CONTA DEALSIZ Recenc Frequen Moneta
UMBER TYORDE CH NENUM DER CTLINE CTCOD MERNA SLINE1 DE Y CTLAST CTFIRST E y cy ry
RED BER E ME NAME NAME
897
Land of
Shippe Motorc S10_167 2125557 Long
10107 30 95.70 2 2871 24/02/2018 828 95 Toys NYC 10022 USA Yu Kwai Small 1331 1337 2871
d ycles 8 818 Airport
Inc.
Avenue
59 rue
Reims
Shippe Motorc S10_167 26.47.15 de
10121 34 81.35 5 2765.9 07/05/2018 757 95 Collect Reims 51100 France Henriot Paul Small 1259 1260 2765.9
d ycles 8 55 l'Abbay
ables
e
27 rue
Lyon du
Shippe Motorc S10_167 +33 1 46 Da Mediu
10134 41 94.74 2 3884.34 01/07/2018 703 95 Souveni Colonel Paris 75508 France Daniel 1204 1204 3884.34
d ycles 8 62 7555 Cunha m
ers Pierre
Avia
Toys4Gr 78934
Shippe Motorc S10_167 6265557 Pasade Mediu
10145 45 83.26 6 3746.7 25/08/2018 649 95 ownUps Hillside 90003 USA Young Julie 1149 1155 3746.7
d ycles 8 265 na m
.com Dr.
Technic 9408
Shippe Motorc S10_167 6505556 Burlinga Mediu
10168 36 96.66 1 3479.76 28/10/2018 586 95 s Stores Furth 94217 USA Hirano Juri 1085 1085 3479.76
d ycles 8 809 me m
Inc. Circle
RFM Analysis

The output excel file from Tableau is used in excel to get actual RFM scores on sales:

Bin 1 : Very Active customers with high orders Monetary


& Sales values. Recency Frequency
H M L
H

Bin 2 : At risk customers with good orders & Sales H M Bin 1


L
values. H
M M Bin 2
L
Bin 3 : Lost customers which could have provided H
L M Bin 3
high revenues with onetime orders. L
RFM Inferences

Bin1 consists of top 25% customers which are termed as active customers and among it
customers which are loyal and bring in monetary benefits are below:
RFM Inferences

Bin1 also consists of top 25% most loyal customers which are termed as active customers
and among it customers which are loyal and bring in monetary benefits are below:
RFM Inferences

Bin 2 is set of At risk customers (with recency 25% - 75%), which means they are
at risk of churning.
RFM Inferences

Bin 3 is set of lost customers (with recency more than 75%), which means are
already lost and may not return.
RFM Inferences

➢ Most of the customers belong to Bin 2 are in very critical situation and there is a huge
potential of this segment may switch to another supplier.
➢ In the Bin 1 ‘active’ and ‘loyal’ customers bringing decent revenue.
➢ In the Bin 2, the company can target to give more services to bring in loyalty as they
amount to a good revenue.
➢ The At-risk customers bring in the most monetary benefits – but haven’t purchased
recently.
➢ Maximum loyal customers belong to countries Japan & France.
➢ USA is having most at risk customers – again followed by France, Germany and
Finland. If shipping issues could be handled better for these countries – it would be
better for business.
Thanks

You might also like