Professional Documents
Culture Documents
ORF 544 Week 7 Uncertainty and Designing Policies
ORF 544 Week 7 Uncertainty and Designing Policies
Warren Powell
Princeton University
http://www.castlelab.princeton.edu
Uncertainty modeling
PFAs and policy search
Types of uncertainty
» This is for problems where the transition function is known (e.g. from
physics) but where there is exogenous inputs.
© 2019 Warren B. Powell
Types of uncertainty
Control uncertainty
» You ask for but you get
» You order 10 items, but only get 8 due to a stockout.
» You try to drive at 70 mph, but there is variability due to your
inability to hold a speed perfectly, along with the effect of traffic.
» Uber sends an invitation to a driver to take a rider, but the driver
may decline.
» Uber would like 50 drivers on duty, but can only influence their
behavior through surge pricing.
» Wiley sets a wholesale price of $80, but Amazon sells at some
random price above that (limits Wiley’s ability to set prices).
» You order plastic balls with a diameter of 10mm, but get balls with
a diameter of 12mm (or some variance around 10mm).
Types of distributions
• Mean: a
E[ X ] =
a+b
ab
Var[ X ] = 2
• Variance (a + b ) (a + b + 1)
» where
» where
et : N (0, s e2 )
ìïï 1 w. p. h
Jt = í
ïïî 0 w. p. 1- h
etJ : l e- l x
Normal to anything
Jan Feb March April May June July Aug Sept Oct Nov Dec
• and
0.0 0.0
Observed forecast error Standard normal deviate Z
Simulated
Error
Observed
distribution
Jan Feb March April May June July Aug Sept Oct Nov Dec
f tt '
Actual
Observed
Simulated
Observed
Simulated
Simulated
Observed
A/B
P ( StCmeans
| S C above or below
1 t ) is estimated from historical data.
S/M/L means a short, medium or long crossing distribution
Wt
Wt Wind speed at time t.
• Thegwind speed given the crossing state:
Wt Wind speed aggregated into 5 ranges.
P (Wt 1 | Wt g , StC ) Density of Wt 1 given Wt g and StC
0.0 0.0
Observed forecast error Standard normal deviate Z
Decreasing
Highest value of Wt g
wind speeds
Simulated
Error Observed
distribution
Observed Observed
Simulated
Simulated
July October
Forecasted wind
• Locally/semi/non parametric
– Requires optimizing over local regions
T
X ( St ) arg max xt C ( St , xt ) E max E C ( St ' , X t ' ( St ' )) | St 1 | S t , xt
*
t
t 't 1
t 't 1
t 't 1
tH
X ( St ) arg max xt C ( St , xt ) E max E C ( Stt ' , X tt ' ( S tt ' )) | S t ,t 1 | S tt , xt
*
t
t 't 1
• Bayes greedy/naïve
X Bayes Bayes
( S n ) = arg max (harder)
x E F ( x, W )
X TS
( S n
) = arg
• Thompson sampling x x max ˆ
m where ˆ
m x : N ( mn
x , b n
x)
» with
» Chance
P[ Atconstrained
xt f (W )]programming
1
T
» X LA S
Stochastic
t ( S ) arg max
t lookahead C ( Stt , xttprog/Monte
/stochastic ) p(Carlo C ( Ssearch
) tree
tt ' (), xtt ' ( ))
t 't 1
t
T
X t
LA RO
( St ) arg max min C ( Stt , xtt ) C (S tt ' ( w), xtt ' ( w))
xtt ,..., xt ,t H wWt ( )
» “Robust optimization” © 2019 Warren B. Powell t 't 1
Four (meta)classes of policies
1) Policy function approximations (PFAs)
Function approx.
» Chance
P[ Atconstrained
xt f (W )]programming
1
T
» X LA S
Stochastic
t ( S ) arg max
t lookahead C ( Stt , xttprog/Monte
/stochastic ) p(Carlo C ( Ssearch
) tree
tt ' (), xtt ' ( ))
t 't 1
t
T
X t
LA RO
( St ) arg max min C ( Stt , xtt ) C (S tt ' ( w), xtt ' ( w))
xtt ,..., xt ,t H wWt ( )
» “Robust optimization” © 2019 Warren B. Powell t 't 1
Four (meta)classes of policies
1) Policy function approximations (PFAs)
» Lookup tables, rules, parametric/nonparametric functions
2) Cost function approximation (CFAs)
» X CFA ( S | ) arg max C
( St , xt | )
t
xt Xt ( )
3) Policies based on value function approximations (VFAs)
Imbedded optimization
» Chance
P[ Atconstrained
xt f (W )]programming
1
T
» X LA S
Stochastic
t ( S ) arg max
t lookahead C ( Stt , xttprog/Monte
/stochastic ) p(Carlo C ( Ssearch
) tree
tt ' (), xtt ' ( ))
t 't 1
t
T
X t
LA RO
( St ) arg max min C ( Stt , xtt ) C (S tt ' ( w), xtt ' ( w))
xtt ,..., xt ,t H wWt ( )
» “Robust optimization” © 2019 Warren B. Powell t 't 1
Four (meta)classes of policies
1) Policy function approximations (PFAs)
» Lookup tables, rules, parametric functions
2) Cost function approximation (CFAs)
» X CFA ( S | ) arg max C
( St , xt | )
t
xt Xt ( )
3) Policies based on value function approximations (VFAs)
» X VFA ( S ) arg max C ( S , x ) V x S x ( S , x )
t t xt t
4) Direct lookahead policies (DLAs)
t t t t t
» Deterministic lookahead/rolling horizon proc./model predictive control
T
X t
LA D
( St ) arg max C ( Stt , xtt ) C (S tt ' , xtt ' )
xtt ,..., xt ,t H
t 't 1
» Chance
P[ Atconstrained
xt f (W )]programming
1
T
» X LA S
Stochastic
t ( S ) arg max
t lookahead C ( Stt , xttprog/Monte
/stochastic ) p(Carlo C ( Ssearch
) tree
tt ' (), xtt ' ( ))
t 't 1
t
T
X t
LA RO
( St ) arg max min C ( Stt , xtt ) C (S tt ' ( w), xtt ' ( w))
xtt ,..., xt ,t H wWt ( )
» “Robust optimization” © 2019 Warren B. Powell t 't 1
Designing policies
Finding the best policy
» We have to first articulate our classes of policies
f F PFAs, CFAs,VFAs, DLAs
f Parameters that characterize each family.
T N 1
or E C St , X ( St | ) or E F X ( S n | ), W n 1
max
t 0 n0
1 N n
F F ( )
N n 1
If we simulate policies 1 and 2 , we would like to conclude
»
that 1 is better than 2 if
F 1 F 2
L
E B
» Notes:
• This is a very simple “cost function correction term” with a
scalar parameter