Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

CUSTOMER SEGMENTATION E-COMMERCE

using RFM ANALYSIS

Réalisée par : HABRICH NORA


Encadré par : Mr. LOTFI NAJDI
Filière : Finance et ingénierie décisionnel
Année universitaire : 2023/2024
Table des matières
Introduction
Libraries
Load Dataset
Data cleaning
Recorde variables
Calculate RFM
Visualize results

CUSTOMER SEGMENTATION E-COMMERCE


16/01/2023 2
using RFM ANALYSIS
Introduction
Typically, e-commerce datasets are proprietary and consequently hard to find among publicly available data. However, The UCI Machine
Learning Repository has made this dataset containing actual transactions from 2010 and 2011. The dataset is maintained on their site, where
it can be found by the title "Online Retail".
This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and
registered non-store online retail. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.
It is easier to sell more to customers that already purchased something at your company than finding new customers. Thus, it might be a
good idea to examine customer data to find out when and how much they spend. One tool to do so is called RFM analysis and this analysis
consists of three components:
RECENCY - how recent did a customer bought something
FREQUENCY - how often does a customer buy something
MONETARY VALUE - what is the value of the purchased items
The most ideal customer recently bought something and often spends a large amount of money. The resulting segments can be ordered from
most valuable (highest recency, frequency, and value) to least valuable (lowest recency, frequency, and value). Identifying the most valuable
RFM segments can capitalize on chance relationships in the data used for this analysis.
CUSTOMER SEGMENTATION E-COMMERCE 3
using RFM ANALYSIS
LIBRARIES :
The following libraries were loaded for the data analysis.

library(data.table)
library(dplyr)
library(ggplot2)
#library(stringr)
#library(DT)
library(tidyr)
library(knitr)
library(rmarkdown)

CUSTOMER SEGMENTATION E-COMMERCE 4


using RFM ANALYSIS
LOAD DATASET:
First, let’s we Load & Examine Dataset

df_data <- fread('../input/data.csv’)


glimpse(df_data)

## Observations: 541,909
## Variables: 8
## $ InvoiceNo <chr> "536365", "536365", "536365", "536365", "536365", ...
## $ StockCode <chr> "85123A", "71053", "84406B", "84029G", "84029E", "...
## $ Description <chr> "WHITE HANGING HEART T-LIGHT HOLDER", "WHITE METAL...
## $ Quantity <int> 6, 6, 8, 6, 6, 2, 6, 6, 6, 32, 6, 6, 8, 6, 6, 3, 2...
## $ InvoiceDate <chr> "12/1/2010 8:26", "12/1/2010 8:26", "12/1/2010 8:2...
## $ UnitPrice <dbl> 2.55, 3.39, 2.75, 3.39, 3.39, 7.65, 4.25, 1.85, 1....
## $ CustomerID <int> 17850, 17850, 17850, 17850, 17850, 17850, 17850, 1...
## $ Country <chr> "United Kingdom", "United Kingdom", "United Kingdo...

CUSTOMER SEGMENTATION E-COMMERCE


5
using RFM ANALYSIS
DATA CLEANING:
Delete all negative Quantity and Price. We also need to delete NA customer ID

df_data <- df_data %>%


mutate(Quantity = replace(Quantity, Quantity<=0, NA),
UnitPrice = replace(UnitPrice, UnitPrice<=0, NA))

df_data <- df_data %>%


drop_na()

CUSTOMER SEGMENTATION E-COMMERCE


6
using RFM ANALYSIS
Recode variables
We should do some recoding and convert character variables to factors.

df_data <- df_data %>%


mutate(InvoiceNo=as.factor(InvoiceNo), StockCode=as.factor(StockCode),
InvoiceDate=as.Date(InvoiceDate, '%m/%d/%Y %H:%M'),
CustomerID=as.factor(CustomerID),
Country=as.factor(Country))

df_data <- df_data %>%


mutate(total_dolar = Quantity*UnitPrice)
glimpse(df_data)

CUSTOMER SEGMENTATION E-COMMERCE


7
using RFM ANALYSIS
## Observations: 397,884
## Variables: 9
## $ InvoiceNo <fctr> 536365, 536365, 536365, 536365, 536365, 536365, 5...
## $ StockCode <fctr> 85123A, 71053, 84406B, 84029G, 84029E, 22752, 217...
## $ Description <chr> "WHITE HANGING HEART T-LIGHT HOLDER", "WHITE
METAL...
## $ Quantity <int> 6, 6, 8, 6, 6, 2, 6, 6, 6, 32, 6, 6, 8, 6, 6, 3, 2...
## $ InvoiceDate <date> 2010-12-01, 2010-12-01, 2010-12-01, 2010-12-01, 2...
## $ UnitPrice <dbl> 2.55, 3.39, 2.75, 3.39, 3.39, 7.65, 4.25, 1.85, 1....
## $ CustomerID <fctr> 17850, 17850, 17850, 17850, 17850, 17850, 17850, ...
## $ Country <fctr> United Kingdom, United Kingdom, United Kingdom, U...
## $ total_dolar <dbl> 15.30, 20.34, 22.00, 20.34, 20.34, 15.30, 25.50, 1...

CUSTOMER SEGMENTATION E-COMMERCE


8
using RFM ANALYSIS
Calculate RFM
To implement the RFM analysis, we need to further process the data
set in by the following steps:
1. Find the most recent date for each ID and calculate the days to the
now or some other date, to get the Recency data
2. Calculate the quantity of translations of a customer, to get the
Frequency data
3. Sum the amount of money a customer spent and divide it by
Frequency, to get the amount per transaction on average, that is the
Monetary data.

CUSTOMER SEGMENTATION E-COMMERCE


9
using RFM ANALYSIS
df_RFM <- df_data %>%
group_by(CustomerID) %>%
summarise(recency=as.numeric(as.Date("2012-01-01")-max(InvoiceDate)),
frequenci=n_distinct(InvoiceNo), monitery= sum(total_dolar)/n_distinct(Inv
summary(df_RFM)
kable(head(df_RFM))

CUSTOMER SEGMENTATION E-COMMERCE


10
using RFM ANALYSIS
## CustomerID recency frequenci monitery
## 12346 : 1 Min. : 23.0 Min. : 1.000 Min. : 3.45
## 12347 : 1 1st Qu.: 40.0 1st Qu.: 1.000 1st Qu.: 178.62
## 12348 : 1 Median : 73.0 Median : 2.000 Median : 293.90
## 12349 : 1 Mean :115.1 Mean : 4.272 Mean : 419.17
## 12350 : 1 3rd Qu.:164.8 3rd Qu.: 5.000 3rd Qu.: 430.11
## 12352 : 1 Max. :396.0 Max. :209.000 Max. :84236.25
## (Other):4332

CUSTOMER SEGMENTATION E-COMMERCE


11
using RFM ANALYSIS
> kable(head(df_RFM))

CustomerID recency frequenci monitery


12346 348 1 77183.6000

12347 25 7 615.7143

12348 98 4 449.3100

12349 41 1 1757.5500

12350 333 1 334.4000

12352 59 8 313.2550

CUSTOMER SEGMENTATION E-COMMERCE


12
using RFM ANALYSIS
Recency – How recently did the customer purchase?

> hist(df_RFM$recency)

CUSTOMER SEGMENTATION E-COMMERCE


13
using RFM ANALYSIS
Frequency – How often do they purchase?

> hist(df_RFM$frequenci, breaks = 50)

CUSTOMER SEGMENTATION E-COMMERCE


14
using RFM ANALYSIS
Monetary Value – How much do they spend?

> hist(df_RFM$monitery, breaks = 50)

CUSTOMER SEGMENTATION E-COMMERCE


15
using RFM ANALYSIS
Becouse the data is realy skewed, we use log scale to normalize

```{r}
df_RFM$monitery <- log(df_RFM$monitery)
hist(df_RFM$monitery)
```

CUSTOMER SEGMENTATION E-COMMERCE


16
using RFM ANALYSIS
Clustering
df_RFM2 <- df_RFM
row.names(df_RFM2) <- df_RFM2$CustomerID
## Warning: Setting row names on a tibble is deprecated.

df_RFM2$CustomerID <- NULL

df_RFM2 <- scale(df_RFM2)


summary(df_RFM2)

## recency frequenci monitery


## Min. :-0.9205 Min. :-0.42505 Min. :-5.8832
## 1st Qu.:-0.7505 1st Qu.:-0.42505 1st Qu.:-0.6153
## Median :-0.4205 Median :-0.29514 Median : 0.0493
## Mean : 0.0000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.4968 3rd Qu.: 0.09457 3rd Qu.: 0.5576
## Max. : 2.8091 Max. :26.59496 Max. : 7.6012

CUSTOMER SEGMENTATION E-COMMERCE


17
using RFM ANALYSIS
d <- dist(df_RFM2)
c <- hclust(d, method = 'ward.D2')

plot(c)

CUSTOMER SEGMENTATION E-COMMERCE


18
using RFM ANALYSIS
Cut
members <- cutree(c,k = 8)
members[1:5]

## 12346 12347 12348 12349 12350


## 1 2 2 1 3

> table(members)

## members
## 1 2 3 4 5 6 7 8
## 255 1878 368 404 1092 319 2 20

CUSTOMER SEGMENTATION E-COMMERCE


19
using RFM ANALYSIS
> aggregate(df_RFM[,2:4], by=list(members), mean)

## Group.1 recency frequenci monitery


## 1 1 64.56078 5.729412 7.148281
## 2 2 90.12886 3.185304 5.944782
## 3 3 323.27989 1.220109 5.920025
## 4 4 266.67327 1.556931 4.774894
## 5 5 68.29304 2.924908 4.974036
## 6 6 36.06897 16.028213 5.711702
## 7 7 23.50000 205.000000 5.828276
## 8 8 28.55000 64.700000 6.678910

CUSTOMER SEGMENTATION E-COMMERCE


20
using RFM ANALYSIS
Businesses that lack the monetary aspect, like viewership, readership, or surfing-oriented products, could use
Engagement parameters instead of Monetary factors. This results in using RFE – a variation of RFM.
Furthermore, this Engagement parameter could be defined as a composite value based on metrics such as
bounce rate, visit duration, number of pages visited, time spent per page, etc.
RFM factors illustrate these facts:
•the more recent the purchase, the more responsive the customer is to promotions
•the more frequently the customer buys, the more engaged and satisfied they are
•monetary value differentiates heavy spenders from low-value purchasers

TITRE DE LA PRÉSENTATION 21
MERCI POUR
VOTRE AIMABLE
ATTENTION

You might also like