PAMS 22Fall Smart Marketing With RRM 4 Parameter Estimation

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Smart Marketing

with Random Rewards


SYSBS
2022-Fall
Zhou Ruikai
SMART Marketing

In previous sessions, we build a decision analysis framework through demand modeling and
business modeling.
The framework allows us to conduct SMART marketing.
• We are marketing to maximize a well-defined objective function, which means we have a specific
goal and measurable progress.
• The maximization is conducted in an attainable action space and a timely-adaptive automated
manner.
• The maximization is rooted in reality through theory-guided, data-based demand estimation.
Consumer Psychology

To estimate the demand, we discuss prevalence psychology theories to highlight the key part
of consumer behavior.
The basic elements of decision making behavior regarding random rewards includes:
• utility maximization – each reward is assigned a utility for each consumer
• expected utility – the utility of the random rewards is the mathematical expectation of the utility of all
possible rewards
• risk preference – there is an individual preference/aversion towards the variance of potential
realizations of utility
• framing effect – the preference/aversion towards variance is dependent on gain/loss framing
Consumer Psychology (2)

When we are dealing with repeated random rewards, influence of past experience with the
same random reward should be considered.
• subjective probability: the expected utility calculation uses subjective probability, which uses
objective information/cues as prior, but is also updated through experience
• Law of Small Numbers: recent experience matters more
• (Gamblers’ Fallacy: for very recent experience, if it is framed as “now”, we expect regression from it; if
framed as “past”, we learn from it)
• salience: memorable experience matters more
• spurious correlation: experience that assimilates current situation matters more
Qualitative Implications

We can derive some qualitative implications from the previous psychology theories:
• Pricing depends on the utility of the rewards.
• Pricing and probability design also depend on risk preference, which is partly given and partly
changed by framing factors.
• We can smoothen the fluctuation of subjective probability by providing probability information,
bundles, rarity system, “blocking” cues (event, pity-timer) – they all strengthen the power of prior and
weaken the effect of experience.
• We can increase utility of rewards by increasing subjective probability of “wins”, using special
effects highlighting “wins”, near-misses obscuring “losses”, and rituals enhancing illusion of control.
The Limitations of Qualitative Insights

Some qualitative insights are unconditional –


• Special effects, near-misses and rituals are in general applicable
Others, however, are not precise enough to inform decision making.
• Should we appeal to risk-seeking or risk-averse consumers?
• What should be our pricing strategy?
• How much should we discount a bundle?

We need to quantify our model to provide answers.


Parameter Estimation
1. Parameterization

2. Identification
1. Parameterization

In previous sessions, we have provided a framework for quantification.


• The key of such framework, is reducing decisions (and their conditions) down to parameters.

The final SMART decision can be expressed as:


• We will choose business parameter(s) 𝑎 = 𝑎0 , when condition parameter(s) 𝑏 = 𝑏0
• Or in general, 𝑎 = 𝑓 𝑏 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑎∈𝐴 𝜋(𝑎, 𝑏)
Examples of Parameterization

𝑎 = 𝑓 𝑏 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑎∈𝐴 𝜋(𝑎, 𝑏)
• We will price our random reward at …, when the risk-preference of the consumers are at … level
• We will use gain/loss framing, when the risk-preference of the consumers are at … level (such that
profit increases/decreases when it’s slightly higher)
• We will bundle 10 random rewards at discount …, when the recency effect of probability learning is
at … level.

Such conditional decision making ensures timeliness and reality-based.


Parametric Model – Supply Side

We have talked about some business concerns that form a supply-side model.

• Basic Model - The simplest one:


𝑝𝑟𝑖𝑐𝑒 − 𝑐𝑜𝑠𝑡 × 𝑠𝑎𝑙𝑒𝑠
• Extension 1 – Considering inventory
𝑝𝑟𝑖𝑐𝑒 − 𝑐𝑜𝑠𝑡 × 𝑠𝑎𝑙𝑒𝑠 − 𝑐𝑜𝑠𝑡𝑠𝑡𝑜𝑐𝑘𝑜𝑢𝑡 (𝑠𝑎𝑙𝑒𝑠)
• Extension 2 – Considering repeated purchase
Σ𝑡 𝜃 𝑡 𝑝𝑟𝑖𝑐𝑒𝑡 − 𝑐𝑜𝑠𝑡 × 𝑠𝑎𝑙𝑒𝑠𝑡
• Extension 3 – Considering product line
Σ𝑗 𝑝𝑟𝑖𝑐𝑒𝑗 − 𝑐𝑜𝑠𝑡𝑗 × 𝑠𝑎𝑙𝑒𝑗 (𝑝𝑟𝑖𝑐𝑒−𝑗 )
Parametric Model – Demand Side

Building on psychology, we can develop a demand-side parametric model


𝑑𝑒𝑚𝑎𝑛𝑑 = Σ𝑖 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖
1 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 𝑖𝑠 𝑚𝑎𝑥𝑒𝑑
𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = ቊ
0 𝑖𝑓 𝑒𝑙𝑠𝑒
𝑢𝑖 = Σ𝑘 𝑝𝑘 𝑢𝑡𝑖𝑙𝑖𝑡𝑦𝑖𝑘 − 𝛽1 + 𝛽2 ⋅ 𝑓𝑟𝑎𝑚𝑖𝑛𝑔 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
• Extension – Considering learning subjective probability from repeated purchases:

𝑢𝑖𝑡 = Σ𝑘 𝑝𝑖𝑘𝑡 𝑢𝑡𝑖𝑙𝑖𝑡𝑦𝑖𝑘 − 𝛽1 + 𝛽2 ⋅ 𝑓𝑟𝑎𝑚𝑖𝑛𝑔 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒


𝑟𝑒𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛𝑖𝑡−1 𝑘
𝑝𝑖𝑘𝑡 = 𝑤𝑖𝑡 ⋅ 𝑝𝑖𝑘𝑡−1 + 1 − 𝑤𝑖𝑡 ⋅
𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖𝑡−1
𝑤𝑖𝑡 = 𝑤1 +𝑤2 ⋅ 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖𝑡−1 + 𝑤3 ⋅ 𝑠𝑎𝑙𝑖𝑒𝑛𝑐𝑒𝑡
2. Three Kinds of Parameters

Focal (decision) parameters: The parameters we need an optimal decision on, we try to
express them as function of other parameters
• price, bundle, certainty product…anything controllable
• Beware of calculation resource
Estimating parameters: The parameters that need data to indirectly tell
• risk preference, utility, recency effect of learning…
• Beware of identification problem
Pre-specified parameters: The parameters that are background information
• product cost, inventory, competitive environment
• Anything directly from data or for simplicity reason
1. The Identification Problem

Identification regards the solvability of the parameters from the data.

Consider the equation

𝑝𝑟𝑖𝑐𝑒 − 𝑐𝑜𝑠𝑡 = 𝑚𝑎𝑟𝑘𝑢𝑝


• We can tell 𝑚𝑎𝑟𝑘𝑢𝑝 when we can specify the value of 𝑝𝑟𝑖𝑐𝑒 and 𝑐𝑜𝑠𝑡.
• This is called just-identification
Under-Identification
When we use data to estimate parameters, the problem often becomes complicated.

1, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 > 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒𝑠


𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = ቊ
0, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 < 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒𝑠

If we assign a value of 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒𝑠, and observe 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 and 𝑝𝑟𝑖𝑐𝑒 from data, we
will get
𝑢𝑖 > 𝑢𝑎𝑙𝑡𝑒𝑟 − 𝑝𝑟𝑖𝑐𝑒 𝑖𝑓 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = 1

𝑢𝑖 < 𝑢𝑎𝑙𝑡𝑒𝑟 − 𝑝𝑟𝑖𝑐𝑒 𝑖𝑓 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = 0
• We cannot solve 𝑢𝑖 out, which is called under-identification.
• More observed consumers does not solve the problem.
Over-Identification

1, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 > 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒𝑠


𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = ቊ
0, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 < 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒𝑠

If besides assigning a value of 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒𝑠 and observing 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 and 𝑝𝑟𝑖𝑐𝑒
from data, we also assume 𝒖𝒊 = 𝒖 to be same across consumers, we get

𝑢 > 𝑢𝑎𝑙𝑡𝑒𝑟 − 𝑝𝑟𝑖𝑐𝑒 𝑖𝑓 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = 1



𝑢 < 𝑢𝑎𝑙𝑡𝑒𝑟 − 𝑝𝑟𝑖𝑐𝑒 𝑖𝑓 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = 0
• Our assumed equations may well contradict with each other, which is called over-identification.
• More consumer observations only amplify the problem.
2. Solving Parameters with Identification Problem

Estimation is about solving parameters from data.


• However, often times our solved parameters are vague, or self-contradictory.

A common practice for such problem includes three steps:


Firstly, reduce parameters, especially those with individual labels, so that our parameters are
over-identified.
𝑢 > 𝑢𝑎𝑙𝑡𝑒𝑟 − 𝑝𝑟𝑖𝑐𝑒 𝑖𝑓 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = 1

𝑢 < 𝑢𝑎𝑙𝑡𝑒𝑟 − 𝑝𝑟𝑖𝑐𝑒 𝑖𝑓 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = 0
(There will be contradiction if we find that one consumer buy at a higher price while another not buy at a
lower price, or vice versa.)
Parameters as Random Variables

Secondly, instead of treating different solved parameters as self-contradictory, we treat the


parameters as random, so that all those “contradictions” are treated as random variations.
• Consumer 1 buys at price $3
• Consumer 2 buys at price $6
• Consumer 3 does not buy at price $4
The observations seem self-contradictory if we assume a unified 𝑢𝑖 = 𝑢.
If we assume the utility follows certain probability distribution, 𝑢~𝑈, and set 𝑢𝑎𝑙𝑡𝑒𝑟 at $0, then
the “contradictions” are understood as:
• Consumer 1 has 𝑢1 > 3, which has probability 1 − 𝑈(3)
• Consumer 2 has 𝑢2 > 6, which has probability 1 − 𝑈(6)
• Consumer 3 has 𝑢3 < 4, which has probability 𝑈(4)
The Most Likely Parameter(s)

Consumer 1 has 𝑢1 > 3, which has probability 1 − 𝑈(3)


Consumer 2 has 𝑢2 > 6, which has probability 1 − 𝑈(6)
Consumer 3 has 𝑢3 < 4, which has probability 𝑈(4)

Finally, after we assign parameters to characterize 𝑼 = 𝑈(𝑎1 , 𝑎2 … ), we can say that the
likelihood for us to observe such consumer behavior is
1 − 𝑈 3; 𝑎1 , 𝑎2 … ⋅ 1 − 𝑈 6; 𝑎1 , 𝑎2 … ⋅ 𝑈 3; 𝑎1 , 𝑎2 …
Parameter estimation is therefore a mission to find the 𝑎1 , 𝑎2 … to maximize such likelihood.
The resulting 𝑈(𝑎1 , 𝑎2 … ) is then our estimate for 𝑢. This is called Maximum Likelihood
Estimation (MLE).
3. Error

Our estimation mission therefore has two major steps:


1. Write the distribution of our estimating parameters, and therefore the likelihood function for
observations.
2. Find a way to maximize the likelihood, and use the estimates from the maximization.

Before we figure out how to conduct the steps, we need to understand why our parameters
are random variables.
The World with Contradictions

Let’s look back the scenario where we cannot treat our estimating parameter 𝑢, as non-
random variable.
• Consumer 1 buys at price $3
• Consumer 2 buys at price $6
• Consumer 3 does not buy at price $4

Why we have contradiction?


The Observable World

The contradiction is rooted in our model.

1, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 > 0
𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = ቊ
0, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 < 0

When we fix 𝑢𝑖 and 𝑢𝑎𝑙𝑡𝑒𝑟 , purchase decision is solely dependent on the price.
• However, price is apparently not the only determinant for purchase decision, we model this way
only because it’s observable while utilities are not.
The Unobservable Rationales

The utility element in our choice model, captures the unobservable factors in purchases (or
not).
• Consumer 1 buys at price $3, because he calculates the expected resell price of the random reward to
be $4. But we don’t observe such calculation.
• Consumer 2 buys at price $6, because other than expectation calculation, he is excited about the
uncertainty. But we don’t observe such excitement.
• Consumer 3 does not buy at price $4, because he is pessimistic about his luck today, leading him to
lower the subjective probability of grand prizes in the random reward. But we don’t observe such
subjective probability.
The Underlying World

With these potential purchase rationales born in mind, look back to our model

1, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 > 0
𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = ቊ
0, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 < 0

All the un-observables, the possible calculation of expectation, the excitement for uncertainty,
and the subjective belief of the probability, are folded into 𝑢𝑖 .
There may also be some factors that we are not aware.

𝑢𝑖 = 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛 + 𝑒𝑥𝑐𝑖𝑡𝑒𝑚𝑒𝑛𝑡 + 𝑏𝑒𝑙𝑖𝑒𝑓 + ⋯ + 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 𝑓𝑎𝑐𝑡𝑜𝑟𝑠


Partitioning the Underlying World

𝑢𝑖 = 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛 + 𝑒𝑥𝑐𝑖𝑡𝑒𝑚𝑒𝑛𝑡𝑖 + 𝑏𝑒𝑙𝑖𝑒𝑓𝑖 + ⋯ + 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 𝑓𝑎𝑐𝑡𝑜𝑟𝑠

We can further separate out the observable parts of the unobservable factors.
• 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛: expected values of random reward
• 𝑒𝑥𝑐𝑖𝑡𝑒𝑚𝑒𝑛𝑡: variance of the random reward
• 𝑏𝑒𝑙𝑖𝑒𝑓: past realizations of random reward

𝑢𝑖 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑏𝑙𝑒 𝑝𝑎𝑟𝑡𝑠𝑖 + 𝑢𝑛𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑏𝑙𝑒 𝑝𝑎𝑟𝑡𝑠 + 𝑢𝑛𝑘𝑛𝑜𝑤𝑛𝑠


Prediction and Error

𝑢𝑖 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑏𝑙𝑒 𝑝𝑎𝑟𝑡𝑠𝑖 + 𝑢𝑛𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑏𝑙𝑒 𝑝𝑎𝑟𝑡𝑠 + 𝑢𝑛𝑘𝑛𝑜𝑤𝑛𝑠


Since we have no idea how those unobservable parts look like, we can group them with
unknowns.
Both unobservable parts and unknowns are unpredictable by our existing information (data),
so we can describe them as the prediction error when we try to predict utility.

𝑢𝑖 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑖 + 𝑒𝑟𝑟𝑜𝑟
Central Limit Theorem

Central Limit Theorem states that, when we add a lot random variables together, the result
will approach normal distribution.

𝑢𝑖 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑖 + 𝑒𝑟𝑟𝑜𝑟

The definition of “error” matches the imagination of summing up a lot of random variables, so
it’s often assumed to be normally distributed.

𝑢𝑖 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑖 + 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑑 𝑒𝑟𝑟𝑜𝑟


Linearization

𝑢𝑖 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑖 + 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑑 𝑒𝑟𝑟𝑜𝑟

How about the 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 part?


• It’s a function of all the observable information (data variables).
A common approximation of such a function is to write the prediction in an additive form
of the observed data variables.

𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑖 = 𝛽1 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽2 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝛽3 ⋅ 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒


4. Writing Likelihood Function

𝑢𝑖 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑖 + 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑑 𝑒𝑟𝑟𝑜𝑟

𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑖 = 𝛽1 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽2 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝛽3 ⋅ 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + ⋯

The linear approximation of prediction and the normalization of error can help us write the
distribution of utility.
𝑢~𝑈 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 …
= 𝑁𝑜𝑟𝑚𝑎𝑙(𝛽1 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽2 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝛽3 ⋅ 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒
+𝑚𝑒𝑎𝑛 𝑒𝑟𝑟𝑜𝑟 [𝛼], 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑒𝑟𝑟𝑜𝑟)[𝜎])
𝑢~𝑈 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 …
= 𝑁𝑜𝑟𝑚𝑎𝑙(𝛽1 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽2 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝛽3 ⋅ 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒
+𝑚𝑒𝑎𝑛 𝑒𝑟𝑟𝑜𝑟 [𝛼], 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑒𝑟𝑟𝑜𝑟)[𝜎])

1, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 > 0
𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = ቊ
0, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 < 0
Our estimation mission is therefore finding the optimal (𝛽1 , 𝛽2 , 𝛽3 , 𝛼, 𝜎) that can maximize the
likelihood function, given data on 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 and 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒.
Link Function

𝑢𝑖 ~𝑈 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 …
= 𝑁𝑜𝑟𝑚𝑎𝑙(𝛽1 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽2 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝛽3 ⋅ 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + 𝛼, 𝜎)

Notice that, 𝑢 is a hidden factor (latent variable), while observable result that 𝑢 causes is
𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 , which is a binary variable.
To estimate the parameters we need, we need to first write 𝑢 as a function of 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛.

𝑢𝑖 = 𝐿 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = 𝛽1 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽2 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝛽3 ⋅ 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + ⋯ + 𝑒𝑟𝑟𝑜𝑟


⇒ 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = 𝐿−1 (𝛽1 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽2 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝛽3 ⋅ 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + ⋯ + 𝑒𝑟𝑟𝑜𝑟)

𝐿(⋅) is called link function.


Probit Regression

𝑢𝑖 = 𝐿 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = 𝛽1 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽2 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝛽3 ⋅ 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + ⋯ + 𝑒𝑟𝑟𝑜𝑟

In our current example, we assume that probability of 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 is the cumulative distribution
function (c.d.f.) of the normally-distributed 𝑢.
• Consumer 1 is buying, so he has 𝑢1 > 3, which has probability 1 − 𝑈(3)
• Consumer 3 is not buying, so he has 𝑢3 < 4, which has probability 𝑈(4)…

Therefore, the link function is an inverse function of the c.d.f. of normal distribution. If we
match 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 and our observed data this way, it’s called probit regression.
5. Identification: Revisit

𝑢 = 𝐿(𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 )~𝑈 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 …


= 𝑁𝑜𝑟𝑚𝑎𝑙(𝛽1 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽2 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝛽3 ⋅ 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + 𝛼, 𝜎)

1, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 > 0
𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = ቊ
0, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 < 0

Do we still have identification problem when we redefine our estimation task as estimating the
distribution properties of our estimating parameter?
Identification: Revisit

𝑢~𝑈 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 …


= 𝑁𝑜𝑟𝑚𝑎𝑙(𝛽1 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽2 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝛽3 ⋅ 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + 𝛼, 𝜎)
Do we still have identification problem?
Yes, we can consider the following two cases:

Case 1: there is no variation in 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛.


• For example, the company only has one kind of random reward on sale and never changes its
composition.
• We cannot estimate 𝛽1 or 𝛽2 this way.
Collinearity

𝑢~𝑈 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 …


= 𝑁𝑜𝑟𝑚𝑎𝑙(𝛽1 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽2 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝛽3 ⋅ 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + 𝛼, 𝜎)
Do we still have identification problem?

Case 2: 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛 and 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 often change together


• We therefore cannot clearly separate the effect of 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛 and the effect of 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒, so 𝛽1 and
𝛽2 may confuse each other.
• To identify 𝛽1 , 𝛽2 , we must be able to change 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛, holding 𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 constant, or vice versa.
Test Marketing

Case 1: there is no variation in 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛.


Case 2: 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛 and 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 often change together

The identification problem tells us that, if we want to understand our consumers better, we
need to conduct our test marketing in an experimental way.
• Explore new product that is a clear extension of old ones
• Tease out the effect of different dimensions (e.g. expectation and variance) of our products
Exogenous Shocks

Sometimes, we may not have the resource to conduct a test marketing.


To identify our parameters, our task would then be to spot the exogenous shocks in our data.

Exogenous shock – event that only impacts some variables in the model but not others
Case 1: there is no variation in 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛?
• Exogenous shock – a free distribution of some of the rewards
Case 2: 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛 and 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 often change together
• Exogenous shock – a promotional event that reward all purchases
We make a few key points in this session:

1. To better inform our decision, we need to quantify our model with parameters.
2. Parameter values are assigned in three ways: freely (not assigned), pre-specified, and estimated.
3. To use data to estimate parameters, we may encounter the problem of over- and under-
identification.
4. We treat the estimating parameters as random, and use data to estimate the distribution
properties – a popular way to doing this is by writing and maximizing the likelihood function of
our data observations.
5. Even with randomized parameters, we need to ensure data variation and “holding constant”. A
rigorous way is to conduct test marketing. If that’s unavailable, exogenous shocks need to be
spotted.
Short Essay
Following last week’s essay, identify two (partially) contradictory viewpoints of your focal
policy/phenomenon. (500 words+)
• Write the contradiction in terms of parameter(s) to be estimated.
• Discuss known factors and the observable parts of them for estimating the parameter(s).

Thematic Analysis (4rd Part of the Final Essay)


Following the random reward case in your 1st part, discuss
• The determinants of the pricing of the random reward.
• The observable (regardless of the observation effort needed) parts of the determinants, as an outside
observer.
To be continued...

You might also like