Professional Documents
Culture Documents
ORF 544 Week 12 Lookahead Policies
ORF 544 Week 12 Lookahead Policies
Warren Powell
Princeton University
http://www.castlelab.princeton.edu
Direct lookahead
t 't 1
t 't 1
tH
X DLA
( St ) arg max xt C ( St , xt ) E max E C ( Stt ' , X tt ' ( Stt ' )) | St ,t 1 | St , xt
t
t 't 1
t 't 1
tH
X ( St ) arg max C ( St , xt ) E max E C ( Stt ' , X tt' ( Stt ' )) | St ,t 1 | S t , xt
*
t
t 't 1
» You put your finger on the piece while you think about
moves into the future. This is a lookahead policy,
© 2019 W.B. Powell
Direct lookahead
Decision trees:
Deterministic approximations
“Model predictive control”
Rolling horizon procedure
Receding horizon procedure
tH
X t
DLA
( St ) arg max xt C ( St , xt ) max xt ,t 1 ,..., xt ,t H C ( Stt ' , xtt ' ) | St ,t 1 | Stt , xt
t 't 1
tH
, x )
arg max xt , xt ,t 1 ,..., xt ,t H
C ( S t , xt )
t 't 1
C ( S tt ' tt '
. . . .
t t 1 t2 t 3
The real process
© 2019 W.B. Powell
Deterministic lookahead
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2019 W.B. Powell
Deterministic lookahead
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2019 W.B. Powell
Deterministic lookahead
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2019 W.B. Powell
Deterministic lookahead
Notes
» It is common in a deterministic lookahead to introduced
a tunable parameter, as we did with our energy storage
problem.
» In effect, we are compensating for the simplicity of our
lookahead with a tunable parameter.
min E C S t , X t (S t | )
t 0
2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
» subject to
åj
x%in , j = 1 Flow out of current node where we are located
t
åi
x%i ,r = 1 Flow into destination node r
åi
x%i , j - å
k
x%j , k = 0 for all other nodes.
17.4 15.9
2 5 8
12.6 5.7
8.1 12.7 9.6
8.4 15.9 8.9 7.3
1 3 6 9 11
9.2 2.3
3.6 13.5 4.5
16.5 20.2
4 7 10
t+3
t+2
t+1
t't 4
The lookahead model
t't 3
t't2
. . . .
t ' t 1
t't
t t 1 t2 t 3
The base model
t't 4
The lookahead model
t't 3
t't2
. . . .
t ' t 1
t't
t t 1 t2 t 3
The base model
t't 4
The lookahead model
t't 3
t't2
. . . .
t ' t 1
t't
t t 1 t2 t 3
The base model
» subject to
å j
%
xt ,in , j = 1 Flow out of current node where we are located
t
å i
%
xtir = 1 Flow into destination node r
å i
%
xtij - å
k
x%tjk = 0 for all other nodes.
C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis and S.
Colton, “A survey of Monte Carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in
Games, vol. 4, no. 1, pp. 1–49, March 2012. © 2019 W.B. Powell
Monte Carlo tree search
Monte Carlo tree search
» Explores some nodes more than others.
log N (st )
ˆ
x t arg max x C (st , x ) V t 1(s t 1 )
N (st , x )
Downstream
value
st
No. times tested xt
» Downstream valueVˆ (s ) actionx from st st 1
t 1 t 1
is obtained by simulating
some policy (not too
complicated) to “peek” into Vˆt 1(st 1 )
the future.
» … but MCTS does not work well with large sets of actions.
Outage calls
(known)
Storm
Network outages
(unknown)
X
call
call
X call
X X
call
call
call© 2019 W.B. Powell
Case study: MCTS for storm response
0.5
call
0.503
Probability that outage 0.503 X
0.503
has occurred here: 0.503 0.503 0.503
X
0.54 0.54 0.76 call
0.064 0.064 0.064
0.064
call 0.603 0.603
0.05 X
0 0.02 0.04 0.502 0.502
0 0 0.03 0.04 0.05 0.502 0.502 0.502
0.67 X0.67
0.05 call
0.0604
call 0.62 0.62
0.08 X
0 0.05 0.51 0.51
0.032
0 0 6 0.056
0.048 0.08 0.51 0.51 0
0.99 X 0
0.08 call
0.06
call 0 0
0.0 X
0 0.0 0 0
0.0
0 0 0.0 0.0 0.0 0 0 0
0 X 0
0.0 call
State variables:
» Physical state
• Location/status of the truck
• Known state of the grid (from visiting segments)
» Other information
• Might be phone calls, weather forecast
» Belief state
• Probabilities of outages on each unobserved link.
0.064
cal
l 0.603 0.603
0.05 X
0 0.02 0.04 0.502 0.502
0 0 0.03 0.04 0.05 0.502 0.502 0.502
0.67 X0.67
0.05 cal
l
0.26 0.38 0.78 0.6 0.6
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.503
0.503 X
0.503
0.503 0.503 0.503
X
0.54 0.54 0.76 cal
l
0.064 0.064 0.064
0.064
cal
l 0.603 0.603
0.05 X
0 0.02 0.04 0.502 0.502
0 0 0.03 0.04 0.05 0.502 0.502 0.502
0.67 X0.67
0.05 cal
l
0.26 0.38 0.78 0.6 0.6
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.503
0.503 X
0.503
0.503 0.503 0.503
X
0.54 0.54 0.76 cal
l
0.064 0.064 0.064
0.064
cal
l 0.603 0.603
0.05 X
0 0.02 0.04 0.502 0.502
0 0 0.03 0.04 0.05 0.502 0.502 0.502
0.67 X0.67
0.05 cal
l
0.0 0.0 0.0 0.715 0.715
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.503
0.503 X
0.503
0.503 0.503 0.503
X
0.54 0.54 0.76 cal
l
0.064 0.064 0.064
0.064
cal
l 0.603 0.603
0.05 X
0 0.02 0.04 0.502 0.502
0 0 0.03 0.04 0.05 0.502 0.502 0.502
0.67 X0.67
0.05 cal
l
0.0 0.0 0.0 0.715 0.715
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.503
0.503 X
0.503
0.503 0.503 0.503
X
0.54 0.54 0.76 cal
l
0.064 0.064 0.064
0.064
cal
l 0.603 0.603
0.05 X
0 0.02 0.04 0.502 0.502
0 0 0.03 0.04 0.05 0.502 0.502 0.502
0.67 X0.67
0.05 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.503
0.503 X
0.503
0.503 0.503 0.503
X
0.54 0.54 0.76 cal
l
0.064 0.064 0.064
0.064
cal
l 0.603 0.603
0.05 X
0 0.02 0.04 0.502 0.502
0 0 0.03 0.04 0.05 0.502 0.502 0.502
0.67 X0.67
0.05 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.503
0.503 X
0.503
0.503 0.503 0.503
X
0.54 0.54 0.76 cal
l
0.0604 0.0604 0.0604
0.0604
cal
l 0.62 0.62
0.08 X
0 0.05 0.51 0.51
0.032
0 0 6 0.056
0.048 0.08 0.51 0.51 0
0.99 X 0
0.08 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.503
0.503 X
0.503
0.503 0.503 0.503
X
0.54 0.54 0.76 cal
l
0.0604 0.0604 0.0604
0.0604
cal
l 0.62 0.62
0.08 X
0 0.05 0.51 0.51
0.032
0 0 6 0.056 0.51
0.048 0.08 0.51 0
0.99 X 0
0.08 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.503
0.503 X
0.503
0.503 0.503 0.503
X
0.54 0.54 0.76 cal
l
0.0603 0.0603 0.0603
0.0603
cal
l 0.63 0.63
0.0 X
0 0.0 0.52 0.52
0.0
0 0 0.0 0.0 0.0 0.52 0 0
0 X 0
0.0 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.503
0.503 X
0.503
0.503 0.503 0.503
X
0.54 0.54 0.76 cal
l
0.0603 0.0603 0.0603
0.0603
cal
l 0.63 0.63
0.0 X
0 0.0 0.52 0.52
0.0
0 0 0.0 0.0 0.0 0.52 0 0
0 X 0
0.0 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.503
0.503 X
0.503
0.503 0.503 0.503
X
0.54 0.54 0.76 cal
l
0.0603 0.0603 0.0603
0.0603
cal
l 0 0.7
0.0 X
0 0.0 0.59 0
0.0
0 0 0.0 0.0 0.0 0.59 0 0
0 X 0
0.0 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.503
0.503 X
0.503
0.503 0.503 0.503
X
0.54 0.54 0.76 cal
l
0.0603 0.0603 0.0603
0.0603
cal
l 0 0.7
0.0 X
0 0.0 0.59 0
0.0
0 0 0.0 0.0 0.0 0.59 0 0
0 X 0
0.0 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.503
0.503 X
0.503
0.503 0.503 0.503
X
0.54 0.54 0.76 cal
l
0.06 0.06 0.06
0.06
cal
l 0 0
0.0 X
0 0.0 0 0
0.0
0 0 0.0 0.0 0.0 0 0 0
0 X 0
0.0 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.503
0.503 X
0.503
0.503 0.503 0.503
X
0.54 0.54 0.76 cal
l
0.06 0.06 0.06
0.06
cal
l 0 0
0.0 X
0 0.0 0 0
0.0
0 0 0.0 0.0 0.0 0 0 0
0 X 0
0.0 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.57
0.57 X
0.57
0 0 0
X
0 0 0 cal
l
0.06 0.06 0.06
0.06
cal
l 0 0
0.0 X
0 0.0 0 0
0.0
0 0 0.0 0.0 0.0 0 0 0
0 X 0
0.0 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0.5
l 0.57
0.57 X
0.57
0 0 0
X
0 0 0 cal
l
0.06 0.06 0.06
0.06
cal
l 0 0
0.0 X
0 0.0 0 0
0.0
0 0 0.0 0.0 0.0 0 0 0
0 X 0
0.0 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0
l
0.67 0 X
0.67
0 0 0
X
0 0 0 cal
l
0.06 0.06 0.06
0.06
cal
l 0 0
0.0 X
0 0.0 0 0
0.0
0 0 0.0 0.0 0.0 0 0 0
0 X 0
0.0 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0
l
0.67 0 X
0.67
0 0 0
X
0 0 0 cal
l
0.06 0.06 0.06
0.06
cal
l 0 0
0.0 X
0 0.0 0 0
0.0
0 0 0.0 0.0 0.0 0 0 0
0 X 0
0.0 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
cal 0
l
0 0 X
0
0 0 0
X
0 0 0 cal
l
0.06 0.06 0.06
0.06
cal
l 0 0
0.0 X
0 0.0 0 0
0.0
0 0 0.0 0.0 0.0 0 0 0
0 X 0
0.0 cal
l
0.0 0.0 0.0 0.0 0.0
X X
cal
l
cal cal
l l © 2019 W.B. Powell
Case study: MCTS for storm response
Performance metrics
» Total outage minutes
Industry heuristic
55110
Lookahead policy
36150
Posterior optimal
æ
{ ( { ( ) }) 0 }ø÷
|
min xÎ X 0 ççc0 x0 + E W1 min x1Î X1 c1 x1 + E W2 min x2 Î X 2 c2 x2 + E W 3 {min x3 Î X 3 (c3 x3 + ...) | S 2 }... | S1
è
S
ö÷
t 1
t' 5
htt '
3 H (h )
2 t tt '
4
H t (htt ' ) t
. . . .
t t 1 t2 t 3
The base model
© 2019 W.B. Powell
Stochastic programming
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
© 2019 W.B. Powell
Stochastic programming
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
© 2019 W.B. Powell
Stochastic programming
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
© 2019 W.B. Powell
Stochastic programming
Notes:
» In most applications of stochastic programming,
decision is a vector (and may even have to be integer).
» Solving even one instance of a multistage stochastic
lookahead model is rarely tractable.
w1 üï
1) Schedule
ïï
ïï
steam
w2 ïï
xt ïï
ïï
w3 ï W
ý t
ïï
w4 ïï
ïï
2) See wind:
ïï
ïï
w5 ïï
3) Schedule turbines þ
© 2019 W.B. Powell
Direct lookahead
Classical approximation strategy: Two-stage
stochastic programming
» Take a look at the two-stage stochastic program
. . . .
t t 1 t2 t 3
The base model
© 2019 W.B. Powell
Stochastic programming
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
© 2019 W.B. Powell
Stochastic programming
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
© 2019 W.B. Powell
Stochastic programming
Notes:
» In practice, it is quite rare to simulate a lookahead
policy based on scenario trees.
» Solving even one instance of a stochastic lookahead
problem can be extremely difficult:
DP lookahead
. . . .
t t 1 t2 t 3
The real process
© 2019 W.B. Powell
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2019 W.B. Powell
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2019 W.B. Powell
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2019 W.B. Powell
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2019 W.B. Powell
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2019 W.B. Powell
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2019 W.B. Powell
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2019 W.B. Powell
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2019 W.B. Powell
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The real process
© 2019 W.B. Powell
Direct lookahead
» Distinguish between:
• A two-stage stochastic programming problem
• A two-stage stochastic program as an approximate lookahead
for a fully sequential problem
© 2019 W.B. Powell
Modeling as a stochastic program
An alternative strategy is to use the vocabulary of
“stochastic programming.”
min x0 X 0 c0 x0 E Q( x0 , 1 )
where
Q( x0 , 1 ( )) min x1 ( )X1 ( ) c1 ( ) x1 ( )
» This is the canonical form of stochastic programming,
which might also be written over multiple periods:
T
min c0 x0 p ( ) ct ( ) xt ( )
t 1
min xt X t ct xt E Q ( xt , t 1 )
where
Q( xt , t 1 ( )) min xt 1 ( )X t 1 ( ) ct 1 ( ) xt 1 ( )
» This is the canonical form of stochastic programming,
which might also be written over multiple periods:
tH
min ct xt
t t
p(t ) ctt ' ( ) xtt ' ( )
t 't 1
t 1
t' 5
htt '
3 H (h )
2 t tt '
4
H t (htt ' ) t
State/policy formulation
Itt '
I
» LP at node
tt ' indexes all variables (costs, decisions, constraints)
with
. . . .
t t 1 t2 t 3
The base model
© 2019 W.B. Powell
Multistage lookahead models
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
© 2019 W.B. Powell
Multistage lookahead models
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
© 2019 W.B. Powell
Multistage lookahead models
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
© 2019 W.B. Powell
Stochastic lookahead policies
Some common misconceptions about stochastic
programming (for sequential problems):
Generation
Generation
Time
Generation Generation
Generation
Time
Time
Time
Generation
Generation
Time
Time
Generation
Time Generation
Time
Time
Scenario trees
2) See wind:
3) Schedule turbines
© 2019 W.B. Powell
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
© 2019 W.B. Powell
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
© 2019 W.B. Powell
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
© 2019 W.B. Powell
Lookahead policies
Notes:
» Two-stage approximations of the unit commitment problem have
been pursued by a number of national labs in the U.S.
• Sandia, Argonne, Livermore, National Renewable Energy Laboratory
» Deterministic unit commitment problems are hard; two-stage
stochastic models, even with a small number of scenarios, are
much harder.
» The two-stage approximation introduces serious errors. Our
biggest problem is not the uncertainty in wind forecasting
tomorrow, it is forecasting wind an hour from now. This is
ignored in a two-stage model.
» The stochastic programming community does not understand the
concept that the two-stage lookahead is a policy, to solve a base
model (which at best they think of as a simulator). Relatively few
teams build simulators to test two-stage policies.
Alternative:
» Use two-stage approximation.
Hour
ahead
Actual wind forecast
2) See wind:
3) Schedule turbines
© 2019 W.B. Powell
Stochastic unit commitment
Notes:
» We can construct a set of scenarios that reflects that
sudden drops can happen at any time of day.
» We will need a lot of scenarios to communicate that
these drops where we need extra reserve can happen at
any time….
» … or we could simply require that we schedule
reserves for all times of the day!
t 48
min
xtt ' t '1,...,24
C(x
t 't
tt ' , ytt ' )
( ytt ' )t '1,...,24
t 48
X (St )=arg min
xtt ' t '1,...,24
C(x
t ' t
tt ' , ytt ' )
( ytt ' )t '1,...,24
t 48
X (St | )=arg min
xtt ' t '1,...,24
C(x
t ' t
tt ' , ytt ' )
( ytt ' )t '1,...,24
xtmax
,t ' xt ,t ' Ltt ' +
up
Up-ramping reserve
xt ,t ' xtmax
,t ' Ltt ' + down
Down-ramping reserve
» This is a (parametric) cost function approximation,
parameterized by the ramping parameters .
© 2019 W.B. Powell
Stochastic unit commitment
We then have to tune the parameters of this policy
in our stochastic base model.
T
Stochastic
min E C St , X ( St | )
t
base
t 0 model
Less steam
Uniform increase in gas
x 2
1 2
f ( x) e
2
© 2019 W.B. Powell
Stochastic unit commitment
Parametric vs. nonparametric
Observations
Nonparametric fit
Total revenue
Parametric fit
True function
Lookahead policy
Policy function
approximation
t 0
» Policy » CFA? Can you tune a
deterministic approximation
X :S X to make it work better?
» Constraints at time t
Value function
approx. (VFAs) Multi-armed bandits
/optimal learning
Lookahead policies
Reinforcement (DLAs)
learning/ADP
Computation
Modeling
Applications
© 2019 W.B. Powell