Chapter 4 - A Primer On Machine Learning For Marketing Analytics

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

A Primer on Machine

Learning for Marketing


Analytics
CONTENT
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
• Steps In Building Machine Learning Models
• Differences Between Machine Learning And Marketing Research
Models
Section I:
Introduction to
Machine
Learning
Supervised Learning
 When input-output data pairs are the core ingredient to each other then such learning is
called supervised learning.
 Supervised learning methods are the most convenient methods among machine
learning.
• There are a few popular algorithms which are used in Supervised Learning as follows:
 Logistic regression is a fast, reliable algorithm to predict a binary response such as
Yes or No
 Gradient Boosted Decision Trees (GBDT) algorithm is used for structured data
since they provide high accuracy with relatively less training time. For example,
Face book uses GBDT Algorithm
 Deep learning algorithms excel in predicting labels for unstructured data such as
images and text.
Unsupervised Learning
An unsupervised learning comprises methods where input data is in abundance, but there
is no desired output available.
Research in unsupervised learning is only beginning to gather steam
There are two popular methods of unsupervised learning are:
 Principal Components Analysis (PCA) is a technique that can be used to construct a handful
of custom features from a large number of inputs. PCA is often used as an intermediate step in
supervised learning so as to reduce the number of output features and then feeding these as
inputs to another algorithm
 Clustering methods are often used to group and segment customer data.
Reinforcement Learning
Reinforcement learning methods rely on input data which has no clear
outputs but a signal that grades the model output.
In this, the algorithm has to sift through several possible outputs
therefore, these methods are computationally intensive.
Under such learning, it is difficult to build algorithms that are able to
learn the signal in noisy environments
A common application for reinforcement learning is ad content
optimization.
Section II:
Building
Machine
Learning
Models
STEPS IN BUILDING MACHINE
LEARNING MODELS
Our objective in developing a machine learning model is to use a set of
inputs to predict an observed response
We start with input data that contains previously observed values and
we wish to figure out a rule that governs the relationship between inputs
and the output
We want this rule to hold on data and the model has never seen before
Once we are sure that the model works on unseen data, we then need to
deploy it.
There is a sequence of steps that one follows in the development of a
machine learning model.
STEPS INVOLVED IN MODEL DEVELOPMENT ARE:

Step 1: Metric Choice and Data Splitting


We choose a metric by which we will judge the performance of the model
For Instance,
o For numeric outcomes, it is standard practice to choose the Root Mean Squared Error (RMSE)
as the metric which is the square root of the mean of the squared difference between the
model predicted value and the actually observed value.
We then split our data into three parts – training, validation and test.
The training data is used to learn the relationship between the input and output
Step 1 (Contd..)
The validation data is used to get a sense of how well the model performs to improve the
training process
The test set is used as the final checkpoint to select the desired model.
For instance,
o To split the data, we borrow a common ratio used in practice for the train-validation-test split:
70-10-20, that is, we randomly sample 70% of our data and assign it to be training data. Then
we randomly sample 10% of the remaining data and assign it to be the validation data. The
balance 20% data is our test data
In precise we can say that,
In the development of a machine learning model we divide the data into training, validation
and testing sets. Training and validation are used to choose the best settings for a specific model
architecture, while the test set is used to decide among different model architectures.
Step 2: Pre-Processing
Once the data is split into training, validation and test, we verify the sanity of the
input data by checking for missing values, the numerical range of the predictors,
and possible relationships between inputs.
If missing values are detected, we either discard the samples or impute the missing
values using a suitable method.
Next we check the numerical range of the predictors by looking at the mean and
the quantiles of the predictors.
This not only helps in optimizing to converge faster during model training but also
aids interpretation.
Finally, we check for the possible correlation between the inputs.
The performance of the model is affected by a set of highly correlated features.
Step 3: Model Training and Hyper
parameter Tuning
A key step of tuning of hyper parameters of the chosen model is to be taken
during the model training process.
As the model choice goes beyond simple linear models, we have to make a set of
choices about model parameters that are specific to the data set.
The choices we make are fixed before the learning process from training data
begins and are called hyper parameters.
An important thing to note is that the hyper parameters cannot be inferred directly
from the training data alone.
The training data would be input-specific parameters, such as regression
coefficients.
We can clear the above analogy by giving an easy example as follows:
Step 3 (Contd..)
While driving a car, the choice of deciding which gear to drive in is a hyper
parameter choice while the decisions made by the car, such as, how much fuel
to inject given the choice of gear, are input-specific parameters. The car cannot
decide on its own which gear to drive in, but once the gear is set it can adapt
itself to optimize its internal machinery for this gear. We need to note that
during the model training process, we wiggle our choice of input-specific
parameters till we reach the lowest level of RMSE.
The leap we make here is that rather than making a conscious decision on a
gear based on our judgement , we allow the data structure to dictate the choice
of gear as well. So, we make a choice of gear, allow the model to fit to the
training data and measure RMSE. We iterate over the four available gears and
choose the best gear as one that has the least RMSE.
Step 3 (Contd..)
During model training process, we have two broad sets of knobs. One set of knobs
(hyper-parameters) control how complex we wish the specific algorithm employed to
be(sort of ‘top level control). and
The second set of knobs(parameters) control the input-specific parameters of model
at the chosen level of complexity.
To make a choice between several candidate models that are fit to the data, we use
the test set to choose a final model among these competing models.
Let us take the previous example of driving the car, the choice of which gear to drive
is done using the validation set that then is utilized by the car to tune its parameters
based on the training data.
We also have to make a decision on which car to drive which is done based on the
test set.
Step 3 (Contd..)
We do have to first make a choice on which specific models we wish
to fit to the data, while hyper-parameter tuning forms the bulk of the
model training process.
All we have to do is to iterate over several model choices to fit the
training data and choose the model that has the lowest RMSE on the
test data.
Step 4: Model Deployment
Much attention and research is focused on developing new model
architectures, deployment of trained models in decision-making
scenarios is also a difficult process.
During the time taken for the model to process the new input and
produce the output, there might be structural differences in the input
data and the live data which results in the derail model predictions.
The deployment process goes through beta testing phase before a
model is put in production.
Under beta testing phase, live data is fed to the model, but predictions
are logged and analyzed.
DIFFERENCES BETWEEN MACHINE LEARNING
AND MARKETING RESEARCH MODELS
As we already discussed, Models are tuned and trained to minimize
predictive errors, with far less focus on the explanatory power of the
model which is particularly different from a typical marketing
perspective on modeling.
In marketing research, the analyst starts with a set of hypotheses
which are then tested by collecting relevant data and measuring
variables required to test these hypotheses. Whereas, in case of
machine learning, the analyst starts by collecting as much data as
possible and chooses a model, however complex it might be, that best
fits this data.
Section III:
Machine
Learning vs
Marketing
Research
ML vs Marketing Research (Contd..)
The extent to which the model fits the data is important in both
Machine Learning and Marketing Research.
The assessment of the impact of specific changes in a feature are a
core concern in marketing research while predictions, are a core
concern in machine learning. That is the reason, linear models are
dominant in marketing research.
The focus on prediction is the main reason why machine learning
models tend to become ‘black boxes’,
In traditional marketing research process, the statistical models
usually employed are designed with small samples in mind.
ML vs Marketing Research (Contd..)
Marketing research methods hinge on a large body of research in
statistics whereas, in case of machine learning, models tend to be data
hungry.
Collecting and optimally using large data sets is a feature of the
machine learning.
In machine learning, the focus is much less on ensuring the general is
ability of findings
Thus, we can say that, both the approaches differ widely in their
approach to modeling but in general practice it is often worthwhile to
explore both options for a given data set.
ML vs Marketing Research (Contd..)
Even when the data sets become too large and pose difficulties for
small sample methods, we can fit linear models on appropriately
drawn random samples from the data which can lead to insights that
can then be incorporated into the training of machine learning models.
Over the last decade, there has been a veritable explosion of data
available to the marketer and an associated increase in the methods
available to analyze this data.
ML vs Marketing Research (Contd..)
Most of these methods borrow their roots from the machine learning
approach, and is optimized for prediction on large amounts of data
While machine learning methods deviate from methods traditionally
accessed by the marketer in scale and intent, they provide a rich array
of potent tools to analyze large data sets.
THANK YOU

You might also like