Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Smart Marketing

with Random Rewards


SYSBS
2022-Fall
Zhou Ruikai
SMART Marketing

Previously, we have looked into the breakdowns of SMART marketing.


• We connect SMART and decision analysis framework. Such framework is built with parameterization,
where three kinds of parameters (decision/estimating/pre-specified) are incorporated.
• Specific – Parameters are included only when it matters with our specific goal.
• Attainable – Before we look into the analysis, we need to have a decision problem in mind and
therefore the space of the (unspecified) decision parameters.
• Realistic – Both pre-specified and estimating parameters should be reality-based
• Timely and Measurable – Parameters should adaptive to changes in
situation – some parameters need to be data-based to change according to data changes
progress – decision and prediction outcomes should be used to adjust the framework
SMART Marketing (Illustration)
Objective
Con-
Action
troll-
Not Space
able
Chan-
ging Pre-
Con- Deci- Out-
Reality speci-
trolled sion come
fied

Chan- Esti-
ging mate

Model
Identification

Identification concerns about how we can solve our parameters from the information of data.
• Our data will not match our model perfectly, because we can only observe limited information.
• We separate the observable and unobservable (+unknown) parts of our parameters, into predictions
and errors.
• Identification (accuracy) therefore depends on whether we can constrain the error, aka find a pure
relationship between parameter and observations, through test marketing or extraneous shocks.
Likelihood Function

When we have the relevant data, the mission becomes adjusting the model by comparing
data and predictions, which include three steps.
Firstly, write down the prediction.
• We model the predictions (usually using linear modeling), and treat the errors as randomly distributed
(usually normally distributed).

𝑢 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 + 𝑒𝑟𝑟𝑜𝑟
⇒ 𝑢 = 𝛽 ⋅ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑏𝑙𝑒𝑠 + 𝑁𝑜𝑟𝑚𝑎𝑙 𝛼, 𝜎
⇒ 𝑢 = 𝛽1 ⋅ 𝑝𝑟𝑖𝑐𝑒 + 𝛽2 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛 + ⋯ + 𝑁𝑜𝑟𝑚𝑎𝑙(𝛼, 𝜎)
Link Function

Secondly, compare the prediction with observation.


• We evaluate the model accuracy by comparing predictions with observable outcomes, bearing in
mind the existence of error. Sometimes we will need a link function between the predictions and
outcomes, when the outcome clearly does not distribute in the way we model the predictions.

𝑢𝑖 = 𝐿 𝑑𝑖 = 𝛽1 ⋅ 𝑝𝑟𝑖𝑐𝑒 + 𝛽2 ⋅ 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛 + ⋯ + 𝑁𝑜𝑟𝑚𝑎𝑙 𝑚𝑒𝑎𝑛, 𝑒𝑟𝑟𝑜𝑟


𝐿 ⋅ 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑙𝑖𝑛𝑘 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 = Π𝑖 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦(𝑑𝑖 |𝐿−1 (𝑢𝑖 ))
Maximum Likelihood Estimation

Finally, depending on the result of comparison, adjust the model.


• We adjust to model to better predict the distribution of our parameters using observable information.

𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 = Π𝑖 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑑𝑖 𝐿−1 𝑢𝑖 ; 𝛽1 , 𝛽2 …


𝛽1 , 𝛽2 … = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑(⋅)
Estimation and Decision

A sidenote is that, after we have gotten the parameter estimates, the resulting task is also a
maximization problem.
𝑝𝑟𝑜𝑓𝑖𝑡 = 𝑝𝑟𝑖𝑐𝑒 − 𝑐𝑜𝑠𝑡 ⋅ 𝑠𝑎𝑙𝑒𝑠
𝑠𝑎𝑙𝑒𝑠 = 𝑑𝑒𝑚𝑎𝑛𝑑 = Σ𝑖 𝑑𝑖
𝑑𝑖 = 𝐿−1 (𝑢𝑖 ; 𝛽1 , 𝛽2 … )
The remaining question, therefore, is how we can perform maximization.
Optimization
1. Optimization
Algorithms

2. Application
The Problem(s)

Both demand estimation and decision making needs to solve a maximization problem.
• In demand estimation, we choose distribution properties of parameters (the actual estimating
parameters) to maximize likelihood function.
• In decision making ,we choose decision parameter to maximize objective function.

However, maximization may not be easy computationally.


1. Mathematic Deduction

The most intuitive way to calculate the maximum is to deduce it mathematically.

𝑓 𝑥 = −𝑥 2
> 0 𝑖𝑓 𝑥 < 0
𝑓 ′ 𝑥 = −2𝑥 = ቐ= 0 𝑖𝑓 𝑥 = 0
< 0 𝑖𝑓 𝑥 > 0
The Limitation of Mathematic Deduction

The calculation burden of deduction increases exponentially with more functions involved,
especially with complicated structure.
• Sometimes there is no closed form of derivative function.

𝑢𝑖𝑡 = Σ𝑘 𝑝𝑖𝑘𝑡 𝑢𝑡𝑖𝑙𝑖𝑡𝑦𝑖𝑘 − 𝛽1 + 𝛽2 ⋅ 𝑓𝑟𝑎𝑚𝑖𝑛𝑔 ⋅ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒


𝑟𝑒𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛𝑖𝑡−1 𝑘
𝑝𝑖𝑘𝑡 = 𝑤𝑖𝑡 ⋅ 𝑝𝑖𝑘𝑡−1 + 1 − 𝑤𝑖𝑡 ⋅
𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖𝑡−1
𝑤𝑖𝑡 = 𝑤1 +𝑤2 ⋅ 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖𝑡−1 + 𝑤3 ⋅ 𝑠𝑎𝑙𝑖𝑒𝑛𝑐𝑒𝑡

• It’s also practically difficult in business decision environment. Practitioners may not have the
mathematical sophistication, and model structure may be subject to frequent changes.
2. Solving the Problem Numerically

When we cannot solve the maximization problem analytically, a numerical solution can be
satisfactory enough.
By the term numerical, we mean that we only get the solution in the form of numbers.

𝑓 𝑥 = −𝑥 2 + 1
This is an example of analytical form of our objective function.
The computer restores such a function numerically, which means that it can output the value
of 𝑓(𝑥0 ), when the input is 𝑥0 .
• 𝑓 𝑥 = 1 𝑤ℎ𝑒𝑛 𝑥 = 0
• 𝑓 𝑥 = 2 𝑤ℎ𝑒𝑛 𝑥 = 1 …
Brute-Force

The intuitive way to get a numerical maximum solution is to exhaust every possibility.
• In terms of maximization, it means that we search the whole domain of definition, to find the
maximum value.
• It is equivalent for us to plot our function.
Accuracy

However, computer can only handle the function in a “one-in,


one-out” way.
• We can only explore finite points, one by one

There exists imprecision, which depends on how many points


we set out to explore to approximate the final result.
Local Maximum

Remember, each output of function value, demands computation resource.


• There is a tradeoff between precision and efficiency.
• To reduce computation burden, rationales and algorithms are applied. They can allow us explore
more precisely, within a given time.
A common way to approximate maximization problem is to solve local maximum instead of
global maximum.
• Global maximum – the maximal value within the whole region of interest
• Local maximum – the value of a point that is maximal within certain region
Root-Finding

A common way to find local maximums (or minimums) is to find roots of the first derivative.
3. Root-Finding Algorithm

Beyond brute-forcing, we can add some basic rationales to improve our root-finding process.
Think about the following treasure hunting example.
Root-Finding Algorithm Rationales

We can have three rationales:


• Always run towards the target direction, if possible.
• When the target is close, check more.
• When the path switches direction a lot, check more.
Newton-Raphson Method

The three rationales of treasure hunting can translate into root finding.
• Always run towards the direction – when 𝑓(𝑥) is larger than 0, we should guess a smaller 𝑥 next time if
𝑓(𝑥) is increasing.
• When the destination is close, check more – when 𝑓(𝑥) is already close to 0, we should our next guess
should not be far away.
• When the path switches direction a lot, check more – when 𝑓’(𝑥) is large in absolution, which means
𝑓(𝑥) is changing fast, our next guess should be conservatively close.

𝑓 𝑥 𝑛
𝑥 𝑛+1 =𝑥 𝑛 −𝛿⋅
𝑓′ 𝑥 𝑛
Stopping Point

For a root-finding algorithm, the ideal stopping point is of course finding the root.
However, numerical is with error, so it’s common to miss the root by a small margin.
• In practice, we need to set an error tolerance, so that we can stop when we are close to a root withing
the limit of the tolerance.

Another consideration is that, sometimes our algorithm cannot approach the root, which is
called divergence (the converse is called convergence).
• In practice, we also need to set a maximal iteration count.
Convergence

Newton’s Method saves a lot of time than brute-force method, but it does not guarantee
finding a root.
• Such trade-off between computation efficiency and convergence is common on numerical
algorithms. For example, a smaller 𝛿 (step length) ensures convergence more, but slower the
algorithm.
𝑓 𝑥 𝑛
• 𝑥 𝑛 + 1 = 𝑥 𝑛 − 𝜹 ⋅ 𝑓′ 𝑥 𝑛
Root Bracketing

Another rationale we can use is that


• We can cross-reference to narrow down the area of destination.

In root-finding, this corresponds to the rationale


• When 𝑓(𝑥1 ) > 0 and 𝑓(𝑥2 ) < 0, or vice versa, there must be at least a root between 𝑥1 and 𝑥2
• We could use sectioning to generate our new search points:

Find 𝑥3 ∈ (𝑥1 , 𝑥2 ), if 𝑓 𝑥3 > 0, then there must be at least a root between 𝑥2 and 𝑥3
Software Practices

The problem of root bracketing is its low speed, it’s just an upgrade from brute-force method
by improving the searching efficiency.

We can combine root bracketing and Newton-Raphson method together.


• Softwares like Python and R use a combination of two methods.
• The basic intuition is to: first, find a bracket by randomized searching; bracket it down by sectioning;
when the distance is close enough, use Newton method.
Seeding

Both methods help us find the stationary points (points


where first derivative equals 0) of our function. However,
stationary points are not exactly what we want (the
maximum point).
• Stationary point can be either local minimum or maximum.
• It can also be a “saddle” point.

To avoid such problem, we can randomize the choice of


initial points, so as to explore most local area of the function
(until, say, there is no new root found within several
iterations).
1. Maximum Likelihood Estimation

A prominent problem we need to face with MLE is that it’s often times multivariate.

𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 = Π𝑖 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑑𝑖 𝐿−1 𝑢𝑖 ; 𝛽1 , 𝛽2 …


𝛽1 , 𝛽2 … = 𝑎𝑟𝑔𝑚𝑎𝑥 𝐿𝑖𝑘𝑒ℎ𝑜𝑜𝑑(⋅)
Steepest Gradient

An intuitive way to solve the question, is to imitate Newton-Raphson method and move in the
direction of maximum (or minimum).

𝑥1 , 𝑥2 , … 𝑛+1 = 𝑥1 , 𝑥2 , … 𝑛 + 𝛿 ⋅ ∇𝑓 𝒙

𝜕𝑓 𝜕𝑓
= 𝑥1 , 𝑥2 , … 𝑛 + 𝛿 ⋅ ( , ,…)
𝜕𝑥1 𝜕𝑥2
Logit Link Function

In previous session, we have used one way to specify the likelihood function – by calculating
the cumulative normal distribution (probit model).
• In practice, calculating cumulative probability distribution can be a bit resource demanding, so an
approximate form is developed.
𝑝𝑖
𝑢𝑖 = ln (𝑤ℎ𝑒𝑟𝑒 𝑝𝑖 = 𝑝𝑟𝑜𝑏(𝑑𝑖 = 1))
1 − 𝑝𝑖

This is called the logit model, and the related link function is logit function. The related error
distribution is Gumbel distribution.
• In practice, probit and logit model tend to yield similar results.
2. Integration

Sometimes we will need to calculate integration numerically.


• The expected maximized profit when demand is an estimated random variable
• The expected utility when the probability is subjective and estimated as random variable
Approximating Integration

Trapezoidal’s Rule set two initial points and draw secant. We therefore calculate the
trapezoids to approximate the result of integration.
• Beyond this intuition, we can also use the information of derivative – a steeper function should use
smaller sections to approximate more precisely
3. Constraints

Sometimes our optimization problem is with constraints.


• We may have a price guarantee to fulfill.
• We may need to consider our inventory.
Linear Programming

A simple form of optimization problem, where the objective function is linear with linear
constraints, is called linear programming.
General Considerations

The direct implication of constraints is that now we need to consider corner solutions (the
function value at the edge of constraint).
• Besides local maximum, we need to also consider the values of corner solutions, which is provided by
the constraints.

Other tactics include:


• Pre-specify the feasible region as our root bracket
• Shorten the step length when approaching constraint
• Transform constraints into objective function.
4. Inconsistent Goals

Sometimes we may have multiple objectives, and some of them may not be compatible.
• Extracting most profit and expanding market size
• Foster customer loyalty and cost control
Combining Goals

The most intuitive and commonly used way to solve goal inconsistency is to merge them into
one objective.
• We can aggregating different goals together
– by applying weights or specifying a higher-dimension objective function
• We can provide satisficing criterion for them.
- by adding such criterion as constraints
• We can also use a punishment function for deviation from certain points
- by setting a reference point for some goals, and then conduct single-objective optimization with
punishments on the deviation from references of the other goals
Numerical Concerns

In this class, we talk about some basics about solving our optimization problems (in both
parameter estimation and decision analysis) numerically.
• Numerical solution is a approximation when the analytical form is hard to get.
• When dimensions are high and function form complicated, we cannot rely on brute-force method.
• Improvement algorithms need to face the tradeoff between efficiency and convergence.
• In applications, we also need to consider and deal with some special problem properties: Multivariate
and Integration add to the computation difficulty and prompt us to find more efficient method.
Constraint and Inconsistency requires special treatment in the optimization problem.
The Overall SMART Decision Framework

Objective 1
Con-
Action
troll-
Not Space
able
Chan-
ging Pre-
Con- Deci- Out-
Reality speci-
trolled sion come
fied 5
Chan- Esti-
ging mate

2&3&4
Model
Final Essay

The topic of our final essay, is to develop the decision analysis parametric model for your
selected case.
• Detailed guidelines will be provided in subsequent email(s).
The due date is Oct 23, with Oct 9 being the due date for early submission.
• Early submissions will be feedbacked within one week of submission.
The End

You might also like