Professional Documents
Culture Documents
PAMS 22Fall Smart Marketing With RRM 4 Parameter Estimation
PAMS 22Fall Smart Marketing With RRM 4 Parameter Estimation
PAMS 22Fall Smart Marketing With RRM 4 Parameter Estimation
In previous sessions, we build a decision analysis framework through demand modeling and
business modeling.
The framework allows us to conduct SMART marketing.
• We are marketing to maximize a well-defined objective function, which means we have a specific
goal and measurable progress.
• The maximization is conducted in an attainable action space and a timely-adaptive automated
manner.
• The maximization is rooted in reality through theory-guided, data-based demand estimation.
Consumer Psychology
To estimate the demand, we discuss prevalence psychology theories to highlight the key part
of consumer behavior.
The basic elements of decision making behavior regarding random rewards includes:
• utility maximization – each reward is assigned a utility for each consumer
• expected utility – the utility of the random rewards is the mathematical expectation of the utility of all
possible rewards
• risk preference – there is an individual preference/aversion towards the variance of potential
realizations of utility
• framing effect – the preference/aversion towards variance is dependent on gain/loss framing
Consumer Psychology (2)
When we are dealing with repeated random rewards, influence of past experience with the
same random reward should be considered.
• subjective probability: the expected utility calculation uses subjective probability, which uses
objective information/cues as prior, but is also updated through experience
• Law of Small Numbers: recent experience matters more
• (Gamblers’ Fallacy: for very recent experience, if it is framed as “now”, we expect regression from it; if
framed as “past”, we learn from it)
• salience: memorable experience matters more
• spurious correlation: experience that assimilates current situation matters more
Qualitative Implications
We can derive some qualitative implications from the previous psychology theories:
• Pricing depends on the utility of the rewards.
• Pricing and probability design also depend on risk preference, which is partly given and partly
changed by framing factors.
• We can smoothen the fluctuation of subjective probability by providing probability information,
bundles, rarity system, “blocking” cues (event, pity-timer) – they all strengthen the power of prior and
weaken the effect of experience.
• We can increase utility of rewards by increasing subjective probability of “wins”, using special
effects highlighting “wins”, near-misses obscuring “losses”, and rituals enhancing illusion of control.
The Limitations of Qualitative Insights
2. Identification
1. Parameterization
𝑎 = 𝑓 𝑏 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑎∈𝐴 𝜋(𝑎, 𝑏)
• We will price our random reward at …, when the risk-preference of the consumers are at … level
• We will use gain/loss framing, when the risk-preference of the consumers are at … level (such that
profit increases/decreases when it’s slightly higher)
• We will bundle 10 random rewards at discount …, when the recency effect of probability learning is
at … level.
We have talked about some business concerns that form a supply-side model.
Focal (decision) parameters: The parameters we need an optimal decision on, we try to
express them as function of other parameters
• price, bundle, certainty product…anything controllable
• Beware of calculation resource
Estimating parameters: The parameters that need data to indirectly tell
• risk preference, utility, recency effect of learning…
• Beware of identification problem
Pre-specified parameters: The parameters that are background information
• product cost, inventory, competitive environment
• Anything directly from data or for simplicity reason
1. The Identification Problem
If we assign a value of 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒𝑠, and observe 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 and 𝑝𝑟𝑖𝑐𝑒 from data, we
will get
𝑢𝑖 > 𝑢𝑎𝑙𝑡𝑒𝑟 − 𝑝𝑟𝑖𝑐𝑒 𝑖𝑓 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = 1
ቊ
𝑢𝑖 < 𝑢𝑎𝑙𝑡𝑒𝑟 − 𝑝𝑟𝑖𝑐𝑒 𝑖𝑓 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = 0
• We cannot solve 𝑢𝑖 out, which is called under-identification.
• More observed consumers does not solve the problem.
Over-Identification
If besides assigning a value of 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒𝑠 and observing 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 and 𝑝𝑟𝑖𝑐𝑒
from data, we also assume 𝒖𝒊 = 𝒖 to be same across consumers, we get
Finally, after we assign parameters to characterize 𝑼 = 𝑈(𝑎1 , 𝑎2 … ), we can say that the
likelihood for us to observe such consumer behavior is
1 − 𝑈 3; 𝑎1 , 𝑎2 … ⋅ 1 − 𝑈 6; 𝑎1 , 𝑎2 … ⋅ 𝑈 3; 𝑎1 , 𝑎2 …
Parameter estimation is therefore a mission to find the 𝑎1 , 𝑎2 … to maximize such likelihood.
The resulting 𝑈(𝑎1 , 𝑎2 … ) is then our estimate for 𝑢. This is called Maximum Likelihood
Estimation (MLE).
3. Error
Before we figure out how to conduct the steps, we need to understand why our parameters
are random variables.
The World with Contradictions
Let’s look back the scenario where we cannot treat our estimating parameter 𝑢, as non-
random variable.
• Consumer 1 buys at price $3
• Consumer 2 buys at price $6
• Consumer 3 does not buy at price $4
1, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 > 0
𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = ቊ
0, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 < 0
When we fix 𝑢𝑖 and 𝑢𝑎𝑙𝑡𝑒𝑟 , purchase decision is solely dependent on the price.
• However, price is apparently not the only determinant for purchase decision, we model this way
only because it’s observable while utilities are not.
The Unobservable Rationales
The utility element in our choice model, captures the unobservable factors in purchases (or
not).
• Consumer 1 buys at price $3, because he calculates the expected resell price of the random reward to
be $4. But we don’t observe such calculation.
• Consumer 2 buys at price $6, because other than expectation calculation, he is excited about the
uncertainty. But we don’t observe such excitement.
• Consumer 3 does not buy at price $4, because he is pessimistic about his luck today, leading him to
lower the subjective probability of grand prizes in the random reward. But we don’t observe such
subjective probability.
The Underlying World
With these potential purchase rationales born in mind, look back to our model
1, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 > 0
𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = ቊ
0, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 < 0
All the un-observables, the possible calculation of expectation, the excitement for uncertainty,
and the subjective belief of the probability, are folded into 𝑢𝑖 .
There may also be some factors that we are not aware.
We can further separate out the observable parts of the unobservable factors.
• 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛: expected values of random reward
• 𝑒𝑥𝑐𝑖𝑡𝑒𝑚𝑒𝑛𝑡: variance of the random reward
• 𝑏𝑒𝑙𝑖𝑒𝑓: past realizations of random reward
𝑢𝑖 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑖 + 𝑒𝑟𝑟𝑜𝑟
Central Limit Theorem
Central Limit Theorem states that, when we add a lot random variables together, the result
will approach normal distribution.
𝑢𝑖 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑖 + 𝑒𝑟𝑟𝑜𝑟
The definition of “error” matches the imagination of summing up a lot of random variables, so
it’s often assumed to be normally distributed.
The linear approximation of prediction and the normalization of error can help us write the
distribution of utility.
𝑢~𝑈 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 …
= 𝑁𝑜𝑟𝑚𝑎𝑙(𝛽1 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽2 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝛽3 ⋅ 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒
+𝑚𝑒𝑎𝑛 𝑒𝑟𝑟𝑜𝑟 [𝛼], 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑒𝑟𝑟𝑜𝑟)[𝜎])
𝑢~𝑈 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 …
= 𝑁𝑜𝑟𝑚𝑎𝑙(𝛽1 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽2 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝛽3 ⋅ 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒
+𝑚𝑒𝑎𝑛 𝑒𝑟𝑟𝑜𝑟 [𝛼], 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑒𝑟𝑟𝑜𝑟)[𝜎])
1, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 > 0
𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = ቊ
0, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 < 0
Our estimation mission is therefore finding the optimal (𝛽1 , 𝛽2 , 𝛽3 , 𝛼, 𝜎) that can maximize the
likelihood function, given data on 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 and 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒.
Link Function
𝑢𝑖 ~𝑈 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 …
= 𝑁𝑜𝑟𝑚𝑎𝑙(𝛽1 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽2 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝛽3 ⋅ 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + 𝛼, 𝜎)
Notice that, 𝑢 is a hidden factor (latent variable), while observable result that 𝑢 causes is
𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 , which is a binary variable.
To estimate the parameters we need, we need to first write 𝑢 as a function of 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛.
In our current example, we assume that probability of 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 is the cumulative distribution
function (c.d.f.) of the normally-distributed 𝑢.
• Consumer 1 is buying, so he has 𝑢1 > 3, which has probability 1 − 𝑈(3)
• Consumer 3 is not buying, so he has 𝑢3 < 4, which has probability 𝑈(4)…
Therefore, the link function is an inverse function of the c.d.f. of normal distribution. If we
match 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 and our observed data this way, it’s called probit regression.
5. Identification: Revisit
1, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 > 0
𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 = ቊ
0, 𝑖𝑓 𝑢𝑖 − 𝑝𝑟𝑖𝑐𝑒 < 0
Do we still have identification problem when we redefine our estimation task as estimating the
distribution properties of our estimating parameter?
Identification: Revisit
The identification problem tells us that, if we want to understand our consumers better, we
need to conduct our test marketing in an experimental way.
• Explore new product that is a clear extension of old ones
• Tease out the effect of different dimensions (e.g. expectation and variance) of our products
Exogenous Shocks
Exogenous shock – event that only impacts some variables in the model but not others
Case 1: there is no variation in 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛?
• Exogenous shock – a free distribution of some of the rewards
Case 2: 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛 and 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 often change together
• Exogenous shock – a promotional event that reward all purchases
We make a few key points in this session:
1. To better inform our decision, we need to quantify our model with parameters.
2. Parameter values are assigned in three ways: freely (not assigned), pre-specified, and estimated.
3. To use data to estimate parameters, we may encounter the problem of over- and under-
identification.
4. We treat the estimating parameters as random, and use data to estimate the distribution
properties – a popular way to doing this is by writing and maximizing the likelihood function of
our data observations.
5. Even with randomized parameters, we need to ensure data variation and “holding constant”. A
rigorous way is to conduct test marketing. If that’s unavailable, exogenous shocks need to be
spotted.
Short Essay
Following last week’s essay, identify two (partially) contradictory viewpoints of your focal
policy/phenomenon. (500 words+)
• Write the contradiction in terms of parameter(s) to be estimated.
• Discuss known factors and the observable parts of them for estimating the parameter(s).