Professional Documents
Culture Documents
Group Assignment Group 2
Group Assignment Group 2
Group Assignment Group 2
GROUP ASSIGNMENT
GROUP MEMBERS
S2145827 KHAR SHIN YIN
S2132380 LEE SHIN EE
17088226 OOI SHI YUAN
S2102170 SIM LIN ZHENG
S2121801 YU YUEN HERN
Content Table
1 Introduction ........................................................................................................................... 1
2 Our Dataset ........................................................................................................................... 1
2.1 Dataset Description .......................................................................................................... 1
3 Business Understanding ....................................................................................................... 2
3.1 Analysis Goal ................................................................................................................... 2
3.2 Analysis Data ................................................................................................................... 3
4 Methodology .......................................................................................................................... 3
4.1 SEMMA Description ....................................................................................................... 3
4.2 SEMMA Process.............................................................................................................. 3
5 Results .................................................................................................................................... 4
5.1 Sample.............................................................................................................................. 4
5.1.1 Metadata.................................................................................................................... 5
5.1.2 Reclassification of the Role and Level of the Variables ........................................... 7
5.2 Explore ............................................................................................................................. 8
5.2.1 Summary Statistics.................................................................................................... 8
5.2.2 Univariate Analysis ................................................................................................. 10
5.2.3 Bivariate Analysis – Variable Association ............................................................. 19
5.2.4 Multivariate Analysis .............................................................................................. 21
5.2.5 Interesting Visualization ......................................................................................... 24
6 Conclusion ........................................................................................................................... 25
7 Appendix .............................................................................................................................. 28
1 Introduction
Marketing analytics refers to the study of customer data to evaluate and devise marketing
activities which has been widely incorporated by businesses across the globe (SAS, 2022). In-
depth, analysis of the customer data can help businesses to understand the driving factors of
consumer action, enhance their marketing strategies, and maximise return on investments from
the wonders of their marketing analytics.
Kation had recently proposed to migrate their existing telco platform to a more
advanced platform to improve user and customer experience. To optimise the platform
migration, Kation had decided to come up with an initiative called “Right Planning”, which
aimed to migrate all customers’ old rate plan to a newer rate plan at a lower cost with better
benefits. With “Right Planning”, Kation was able to enlighten customer experience with new
rate plans with better benefits and offers, at the same time, remove the outdated rates plans and
standardise the rate plans information stored in the new platform.
Before launching “Right Planning” officially to Kation’s seven million customers, the
Base Management Team in Kation launched a pilot campaign with a small group of customers
to assess the effectiveness of the campaign proposed.
2 Our Dataset
A set of campaign data that launched between 5th October 2022 and 11th October 2022. The
dataset collected for this study is a set of secondary data provided by the base management
team in one of the largest telecommunication companies in Malaysia, Kation. There is a total
of 7272 records and the target base is a group of prepaid Malaysian customers with silent status.
The dataset contains target base’ demographic information, usage, and revenue activity before
and after campaign. Table 2.1 showed the variables’ description.
1
WQD7005 Group Assignment
Variable Description
ID Customer ID
TENURE Customer duration with Kation since registration date
AGE Customer age
GENDER Customer gender
NATIONALITY Customer nationality
STATE Customer hometown (state)
STATUS_BEFORE Customer status before campaign launched.
STATUS_AFTER Customer status after campaign ended.
OFFER_TAKER Indicator for customers who opted-in the migration plan.
OFFER_TAKE_UP_DT Date for customers who opted-in the migration plan.
DATA_PURC_BEFORE Indicator for customer who purchased data before campaign
launched.
DATA_PURC_AFTER Indicator for customer who purchased data after campaign
ended.
DATA_CHRG_BEFORE Total amount of data charged before campaign launched.
DATA_CHRG_AFTER Total amount of data charged after campaign ended.
DATA_USG_BEFORE Data usage before campaign launched.
DATA_USG_AFTER Data usage after campaign ended.
VOICE_USG_BEFORE Voice usage before campaign launched.
VOICE_USG_AFTER Voice usage after campaign ended.
RLD_IND_BEFORE Indicator for customer who reload before campaign launched.
RLD_IND_AFTER Indicator for customer who reload after campaign ended.
RLD_AMT_BEFORE Total of reload amount before campaign launched.
RLD_AMT_AFTER Total of reload amount after campaign ended.
ARPU_BEFORE ARPU before campaign launched.
CPA_RVN_BEFORE Total added value service before campaign launched.
CPA_RVN_AFTER Total added value service after campaign ended.
ARPU_AFTER ARPU after campaign ended.
ACTIVITY_DAYS_AFTER Silent days after campaign ended.
ACTIVITY_STATUS_AFTER Customer activity status after campaign ended.
3 Business Understanding
The analysis goal of the study is to assess the performance of the pilot before implementing the
migration for the entire customer base and identify the customers’ preference on old or new
rate plans.
2
WQD7005 Group Assignment
The source of this data is extracted based on recently launched campaign and provided by one
of the largest telecommunications companies in Malaysia. The dataset is private and
confidential (not publicly available). The target variable of the dataset is derived from the
customers’ who opted in the new rate plans via migration. The analysis goal is achievable when
data analysis is carried out:
4 Methodology
Data mining SEMMA (an acronym of sample, explore, modify, model, and assess) is adopted
as the methodology to perform the data analysis. SEMMA was carried out using the SAS
Enterprise Miner tool.
There are 5 major steps in SEMMA: Sample, Explore, Modify, Model and Assess. All
these steps are necessary in conducting a data mining project and available in the SAS
Enterprise Miner.
In this project, first two methodologies, Sample and Explore, were focused to kickstart the
analysis. Figure 4.1 showed the process of SEMMA.
3
WQD7005 Group Assignment
• Data is imported as source file. Data is then converted to SAS file. Data
infomation such as data types and values are identified. The dataset
chosen is large enough to contain significant information and good to
Sample process.
• Data is split into train and test set, then put into several models for
model training. Multiple models are chosen to predict if customers will
opt in the new rate plan after campaign is launched based on their usage
Model and revenue behavior.
5 Results
5.1 Sample
Data collected was saved in Excel format. The Excel file was imported and saved as a SAS file
for data exploration. The procedure on how to create a new project, diagram, library, data
source and converting the excel file as a SAS file are attached in Appendix (5.1 Sample session
i – iv). The dataset collected is a set of campaign data with customer levels information
including customers’ demographic information, usage, and revenue activity before and after
the campaign, with a total of 7272 records and 27 variables.
4
WQD7005 Group Assignment
5.1.1 Metadata
With SAS Enterprise Miner, there are basic and advanced settings to define the variables data
types. Figure 5.1 showed the column metadata and Figure 5.2 showed the data types in default
basic settings. Meanwhile, Figure 5.3 showed the column metadata and Figure 5.4 shows the
data types in default advance settings.
5
WQD7005 Group Assignment
6
WQD7005 Group Assignment
Based on Figure 5.1 and Figure 5.2, we observed that the data types are classified into
nominal and interval as inputs role with default basic system settings. The system automatically
detects the measurement level according to the possible values within the variables. By default,
numeric data was classified as interval type, and character was classified as nominal type.
However, the basic settings werewot suitable to be implemented as most of the data types are
still inaccurate.
After selecting the advanced settings (based on Figure 5.3 and Figure 5.4), the data
types were redefined into binary, interval, nominal and unary. The system automatically detects
the role of the variables into input and rejected role. However, there were still error in which
the data types do not reflect correct data type for the dataset. Hence, manual adjustment is
needed to modify the data types before proceeding to the next stage – Explore.
Reclassification was carried out to enable the creation of suitable charts for these variables.
Figure 5.5 showed the data type modification after manual adjustment and Figure 5.6 was the
final data type summary.
Figure 5.5: Comparison between role and measurement level between advance settings and
manual reclassification
7
WQD7005 Group Assignment
Based on Figure 5.5, the rejected variables were converted to input variables as all the
data should be included before any analysis were carried out to justify the variables deletion.
Target variable is manually added. The unary variables and binary variable which were
supposedly nominal data were converted and the remaining variables remained unchanged.
5.2 Explore
Data was explored to identify the relationships and anomalies via univariate, bivariate and
multivariate analysis with several graphs.
After accessing and assaying the dataset, a summary statistic is generated. The goal of the
summary statistics is to give an overview of the data pattern such as minimum and maximum
value, mean, missing values and standard deviation. Figure 5.7 showed the summary statistics
for interval variables and Figure 5.8 showed the summary statistics for class variables.
8
WQD7005 Group Assignment
Based on Figure 5.7, there are 7 interval variables which do not have any missing values
and summary statistics seems to be normal except AGE. AGE contains a minimum age of -
9999, which seems to be abnormal as human age should between range of 0 to 100 only but
not negative values. This phenomenon occurred because customers’ age information was
missing due to system or human error thus replacing with -9999. The abnormal data points
explain the high value of skewness and large standard deviation.
Based on Figure 5.8, it can be observed that there were no missing values among the
class variables. However, part of the nominal variables such as ARPU_BEFORE,
ARPU_AFTER, CPA_RVN_BEFORE, CPA_RVN_AFTER, DATA_CHRG_BEFORE,
DATA_CHRG_AFTER, RLD_AMT_BEFORE, and RLD_AMT_AFTER contained an
extremely high number of levels which was abnormal.
Referring to Figure 5.9, these variables were containing numeric values instead of class
values, which meant they were supposedly numeric data but wrongly classified as character
data type. Moreover, there were blank values found within these variables. However, these
9
WQD7005 Group Assignment
blank values were represented by ‘?’ values instead of ‘0’ values make them wrongly tagged
as a class variable.
Univariate analysis is the simplest form of statistical data analysis to explore each variable in
the dataset. It does not deal with the causal or relationships but solely find patterns from each
of the variables. Graphs such as histogram, box plot and pie chart are best suited for conducting
the univariate analysis to check pattern distribution, outliers, noisy data and any missing value.
Firstly, histogram was used to visualize distribution and missing values for the 7
interval variables. Figure 5.9 showed the overview of all graphs and Table 5.1 summarized the
patterns, analysis, and abnormalities.
10
WQD7005 Group Assignment
§ No outliers detected.
§ Outliers detected.
11
WQD7005 Group Assignment
§ Outliers detected.
§ Outliers detected.
§ Outliers detected.
12
WQD7005 Group Assignment
No. Findings
1 ACTIVITY_DAYS_AFTER
13
WQD7005 Group Assignment
§ Outliers were detected for all the variables. [Justified as in Table 5.1]
§ TENURE outliers were acceptable as there are prepaid customer who stays loyal.
For categorical variables such as nominal, ordinal and binary, pie chart and bar plot are
used. These graphs are great in dividing categorical data distribution into numerical proportions.
Pie chart is excellent in presenting the proportions categorical data residing within a nominal
variable in a very straightforward manner, especially when there are only a few categories.
Meanwhile, bar plot is great in visualizing high number of possible values available within the
variable.
14 nominal variables are plotted using pie charts and 6 nominal variables are plotted
using bar plots. Figure 5.11 showed the overview of the variables in pie chart and findings are
summarized in Table 5.3. Meanwhile, Figure 5.12 showed the overview of the variables in bar
chart and findings were summarized in Table 5.4.
14
WQD7005 Group Assignment
15
WQD7005 Group Assignment
No. Findings
1 Correct data type: Is nominal variable
16
WQD7005 Group Assignment
17
WQD7005 Group Assignment
§ Status of “DURING & AFTER CAMP” and “BEFORE & AFTER CAMP”
represented the customers who opted in the new rate plan and remain active after
campaign.
§ The pre-view count was more than half of the total count, indicating the campaign
was quite successful.
Bivariate analysis identifies the relationship between two variables. Charts such as scatter plots,
bar plots and box plots can be extremely helpful in finding simple insights. Table 5.5 displayed
the findings between variables in dataset.
19
WQD7005 Group Assignment
20
WQD7005 Group Assignment
Figure 5.14: Findings of status before and after for offer takers
21
WQD7005 Group Assignment
Figure 5.15: Findings of voice usage before and after for offer takers
Figure 5.16: Findings of data usage before and after for offer takers
Other than bar and scatter plots, multivariate analysis can also be done using correlation
matrix. The strength of the correlations is measured from -1 to 1, using the different tone
colours of blue and red, where the deeper the colour, the higher the correlation. Pair of variables
with correlation values greater than 0.9 will undergo the removal of one of the variables to
avoid the collinearity issue. Figure 5.17 and Figure 5.18 showed the correlation matrix and
correlation table.
22
WQD7005 Group Assignment
Based on the correlation matrix and correlation table visualized in Figure 5.17 and
Figure 5.18, the variables do not have correlation values greater than 0.9. Therefore, no
variables were removed.
23
WQD7005 Group Assignment
The revenue impact of campaigns is always the top concern for base management team. Efforts
demonstrated should be within planning and budgeting. Hence, pre and post campaign analysis
on revenue is a mandatory step to evaluate the effectiveness and success of campaigns. In this
study, Figure 5.20 displayed the findings of ARPU before and after for the campaign offer
takers.
24
WQD7005 Group Assignment
Figure 5.20: Findings of ARPU before and after for offer takers
Based on Figure 5.20, there were 2 arrow shapes formed for both offer takers (Y) and
non-offer takers (N) under OFFER_TAKER. However, findings were not concluded as both
the variables, ARPU_BEFORE and ARPU_AFTER are incomplete data. The values are not in
ascending order as they are tagged under nominal variable. Findings will be concluded in
conclusion after the data undergoes pre-process
6 Conclusion
Initially, the dataset contains 7 interval variables and 20 nominal variables. After manually
revising the metadata, the output is shown in the Table 6.1.
In data exploration, there are 4 types of data error such as incomplete, noisy,
inconsistent and intentional data were found in the dataset. The findings are summarised in
Table 6.2.
25
WQD7005 Group Assignment
Based on the visualizations displayed in session 5.2 Explore, both objective 1 and
objective 2 are achieved. The objectives are proven with key findings as listed in Table 6.3.
2 (To identify the profile • Campaign takers showed a higher numbers of opt in rate
of campaign takers) among male compared to female.
• Campaign takers were mostly from age group between 22
and 36 years old with tenure more than 1 year.
• Most of the campaign takers are mainly from Klang Valley,
then follow by Sabah and Sarawak.
26
WQD7005 Group Assignment
27
WQD7005 Group Assignment
7 Appendix
Attached below the procedures for conducting the first two steps of SEMMA in SAS Enterprise
Miner - Sample and Explore.
5.1 Sample
i. Create a new Enterprise Miner Project
28
WQD7005 Group Assignment
29
WQD7005 Group Assignment
30
WQD7005 Group Assignment
iv. Create a library for the imported data file / SAS file
31
WQD7005 Group Assignment
32
WQD7005 Group Assignment
33
WQD7005 Group Assignment
34
WQD7005 Group Assignment
5.2 Explore
i. Generate a Statistics Table
• Set the Summarize property at the property side bar to Yes before running the “File
Import” node.
• Right click “File Import” node and select Results... to display the Statistics Table.
35
WQD7005 Group Assignment
•
• Connect the “File Import” node to the “Graph Explore” node. Then, right click the
“Graph Explore” node and select Run.
•
• After the “Graph Explore” node has been successfully run, select Plot… icon as
highlighted below and choose your desired graphs.
36
WQD7005 Group Assignment
•
• Specify the variable that you wish to visualise and click Finish.
37
WQD7005 Group Assignment
38
WQD7005 Group Assignment
• Select Run > Results... > View > Plots > Chi-Square Plot
• The Chi-Square plot result window will be shown. Select Cramer’s V from the
drop-down box in top left corner of the window.
vi. To understand the correlation among the variable, Variable Clustering node is added
• Select Add Node > Explore > Variable Clustering
39
WQD7005 Group Assignment
•
• When viewing the results on variable correlation, a correlation matrix will be
shown.
• Select Run > Results... > View > Model > Variable Correlation
•
• To view the values used in deriving the correlation matrix, click on the Table icon
on the top left of the result screen. A table should be displayed showing the
correlation values.
40
WQD7005 Group Assignment
41