Powell-Tutorial-ComputationalStochasticOptimization_Informs_Nov152014

Clearing the Jungle of Stochastic Optimization
Informs Annual Meeting

San Francisco
November 10, 2014
Warren B. Powell
Princeton University
Department of Operations Research
and Financial Engineering
© 2014 Warren B. Powell, Princeton University

Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
Schneider National
Optimizing energy storage
Take advantage of price variations
Applications
All of these problems are examples of sequential decision
problems.
We are going to argue that we do a very good job of
modeling deterministic optimization problems, and
static/two-stage stochastic optimization problems.
… but sequential problems are a different matter.
In this tutorial, we are going to present a canonical

framework that spans all the communities that work on
this problem class.
… and we are going to see that these four problems
illustrate four fundamental classes of policies for
sequential stochastic optimization.
Outline
Four basic problems
State variables
Modeling stochastic, dynamic problems
Before we can solve complex problems, we have
to know how to think about them.
Min E {cx}
Ax = b
x>0
Organize class
libraries, and set up
communications and
databases
Mathematician
Software
The biggest challenge when making decisions
under uncertainty is modeling.
Deterministic modeling
For deterministic problems, we speak the language
of mathematical programming
» For static problems
min cx Arguably Dantzig’s biggest
Ax  b contribution, more so than
x0 the simplex algorithm, was
» For time-staged problems his articulation of
T
optimization problems in a
min  ct xt standard format, which has
t 0
given algorithmic
At xt  Bt 1 xt 1  bt researchers a common
Dt xt  ut language.
xt  0
Stochastic programming
Robust optimization
Approximate dynamic programming
Model predictive control
Optimal control
Online learning
Reinforcement learning
Markov decision processes

Robust optimization
Optimal control
Online learning

Modeling
A recent comment by a (helpful) referee:
» …One of the main contributions of the paper is the

demonstration of a policy-based modeling framework for
transportation problems with uncertainty. However, it
could be argued that a richer modeling framework already
exists (multi-stage stochastic programming) that does not
require approximating the decision space with policies….
W. B. Powell, H. Simao, B. Bouzaiene-Ayari, “Approximate Dynamic Programming in

Transportation and Logistics: A Unified Framework,” European J. on Transportation and
Logistics, Vol. 1, No. 3, pp. 237-284 (2012). DOI 10.1007/s13676-012-0015-8.
Modeling as a Markov decision process
For stochastic problems, many people model the
problem using Bellman’s equation
 
V ( s )  min a  C ( s, a )    p ( s ' | s, a)V ( s ') 
 s' 
where
s  "State variable"
a  Discrete action
p( s ' | s, a)  "Model" (transition matrix, transition kernel)
V ( s )  Value of being in state s
  Discount factor
» This is the canonical form of a dynamic program
building on Bellman’s seminal research. Simple,
elegant, widely used but difficult to scale to realistic
problems.
Modeling as a Markov decision process
“Canonical model” from Puterman (Ch. 3)
Modeling as a stochastic program
An alternative strategy is to use the vocabulary of
“stochastic programming.”
min x0 X 0 c0 x0  E Q( x0 , 1 )
where
Q( x0 , 1 ( ))  min x1 ( )X1 ( ) c1 ( ) x1 ( )
» This is the canonical form of stochastic programming,
which might also be written over multiple periods:
T
min c0 x0   p ( ) ct ( ) xt ( )
 t 1
Modeling as a stochastic program
An alternative strategy is to use the vocabulary of
“stochastic programming.”
min xt X t ct xt  E Q ( xt ,  t 1 )
where
Q( xt , t 1 ( ))  min xt 1 ( )X t 1 ( ) ct 1 ( ) xt 1 ( )
» This is the canonical form of stochastic programming,
which might also be written over multiple periods:
tH
min ct xt 

 p (t )  ctt ' ( ) xtt ' ( )
t t t 't 1
Modeling using control theory
From “Optimal Control” by
Lewis, Vrabie and Syrmos
» Standard model in optimal

control is deterministic.
» Bundles objective function

with optimality criteria
comparable to Bellman
equations.
Modeling
We lack a standard language for modeling
sequential, stochastic decision problems.
» In the slides that follow, we propose to model problems
along five fundamental dimensions:
• State variables
• Decision variables
• Exogenous information
• Transition function
• Objective function
» This framework is widely followed in the control

theory community, and almost completely ignored in
operations research and computer science.
Modeling dynamic problems
The system state:
Controls community
xt  "Information state"
 Operations research/MDP/Computer science
 St   Rt , I t , K t   System state, where:
 Rt  Resource state (physical state)
 Location/status of truck/train/plane
 Energy in storage
 I t  Information state
 Prices
 Weather
 K t  Knowledge state ("belief state")
Belief about traffic delays
Belief about the status of equipment
Decisions:
Computer science
 at  Discrete action
 Control theory
ut  Low-dimensional continuous vector
 Operations research
 xt  Usually a discrete or continuous but high-dimensional
 vector of decisions.
 At this point, we do not specify how to make a decision.
 Instead, we define the function X  ( s) (or A ( s ) or U  ( s )),
 where  specifies the type of policy. " " carries information
 about the type of function f , and any tunable parameters    f .
Exogenous information:
Wt  New information that first became known at time t
 
= Rˆt , Dˆ t , pˆ t , Eˆ t 
 Rˆt  Equipment failures, delays, new arrivals
 New drivers being hired to the network
 Dˆ t  New customer demands
 pˆ t  Changes in prices
 Eˆ t  Information about the environment (temperature, ...)
 Note: Any variable indexed by t is known at time t. This
 convention, which is not standard in control theory,
dramatically simplifies the modeling of information.
 Below, we will let  represent a sequence of actual observations W1 , W2 ,.... Wt  
refers to a sample realization of the random variable Wt .
The transition function
St 1  S M ( St , xt , Wt 1 )
 R  R  x  Rˆ Inventories
 t 1
pt 1  pt
t

t t 1
pˆ t 1 Spot prices

Dt 1  Dt  Dˆ
 t 1 Market demands
 Also known as the:
 “System model” “Transfer function”
 “State transition model”
“Plant model”
“Transformation function”
“Law of motion”
 “Plant equation” “Model”
 “Transition law”
For many applications, these equations are unknown.

This is known as “model-free” dynamic programming.
Stochastic optimization models
The objective function
 T

min  E   C  St , X t ( St ) 
 t 
 t 0 
Expectation over all
Cost function
random outcomes
State variable Decision function (policy)
Finding the best policy
Given a system model (transition function)
St 1  S M  St , xt ,Wt 1 ( ) 
We call this the base model.

Objective functions
There are different objectives that we can:
» Expectations
min x E F ( x, W )
» Risk measures
min x E F ( x, W )   E  F ( x, W )  f 
2
min x  r F ( x, W )    Convex/coherent risk measures

» Worst case (“robust optimization”)
min x max w F ( x, w)
The choice of objective is up to the modeler.

Modeling
Deterministic Stochastic
» Objective function » Objective function
T t 
min  E   C  St , X t ( St ) 
T
min
x0 ,..., xT
c x
t 0
t t

 t 0


» Decision variables: » Policy
 x0 ,..., xT  X :S  X
» Constraints: » Constraints at time t
• at time t
At xt  Rt  xt  X t ( St )  Xt
X
xt  0  t

• Transition function » Transition function
Rt 1  bt 1  Bt xt St 1  S M  St , xt ,Wt 1 
» Exogenous information
(W1 , W2 ,..., WT )
With deterministic problems, we want to find the
best decision:
T
min x0 ,..., xT c x
t 0
t t
With stochastic problems, we want to find the best

function (policy) for making a decision:
 T

min  E    t C  St , X t ( St ) 
 t 0 
» … which is sometimes written
 T

min x0 ,..., xT E   C  St , xt 
 t
 t 0 
where xt is F t  measurable.
A model without an algorithm is like cloud to cloud
lightning…
… pretty to look at, but no impact.

Modeling
There are two practical issues working with this
objective function
» How do we compute the expectation?
» How do we search over policies?

Computing the objective function
In practice, we cannot compute the expectation in:
 T

min  E   C  St , X ( St ) 
 t 
 t 0 
Instead, we might do one long simulation…
T
min  Fˆ     t C  St ( ), X t ( St ( )) 
t 0
…or we might average across several simulations:
N T
1

min  F 
N
  t
 t
C S (
n 1 t  0
 n
), X 
t ( S t ( n
)) 
Modeling
There are two ways to compute the objective:
 T

min  E    t C  St , X  ( St ) 
 t 0 
» Offline learning » Online learning
• We use computer • We observe the
simulation to compute performance of a policy
T in the field.
Fˆ    t C  St ( ), X t ( St ( )) 

t 0
• Provides controlled • No control over test

testing environment. environment.
• Requires living with • Avoids dependence on
model assumptions. model assumptions.
• Can quickly compare • Policy evaluations are
different policies. quite slow.
Searching for a policy
We have to start by describing what we mean by a
policy.
» Definition:
A policy is a mapping from a state to an action.

… any mapping.
How do we search over an arbitrary space of

policies?
» Scanning the literature, it appears that every algorithmic
strategy can be boiled down to four fundamental classes:
Four (meta)classes of policies
1) Policy function approximations (PFAs)
» Lookup tables, rules, parametric functions
2) Cost function approximation (CFAs)
» X CFA ( S |  )  arg min  C  ( S , x |  )
t x X ( ) t t
t t
3) Policies based on value function approximations (VFAs)

xt t t 
» X VFA ( S )  arg min C ( S , x )   V x S x ( S , x )
t t t t t t  
4) Lookahead policies
» Deterministic lookahead:
T
X ( St )  arg min C ( Stt , xtt )   C ( Stt ' , xtt ' )
t
LA  D
xtt ,..., xt ,t  H
t 't 1
» Stochastic lookahead (e.g. scenario trees)
T
X t
LA  S
( St )  arg min C ( Stt , xtt ) 

 
 
) 
p(
t ' t 1
t ' t
C ( Stt ' (), xtt ' ( ))
t
» “Robust optimization”
T
X t
LA  RO
( St )  arg min max C ( Stt , xtt ) 
xtt ,..., xt ,t  H wWt ( )
 C (S
t 't 1
tt ' ( w), xtt ' ( w))
Function approximations
There are three classes of
approximation strategies
» Lookup table
• Given a discrete state, return a discrete
action or value 1 2 3 4 5
» Parametric models
• Linear models (“basis functions”)
• Nonlinear models
» Nonparametric models
• Kernel regression
• Nearest neighbor clustering
• Local polynomial regression
Policies
(Slightly arrogant) claim:
» These are the fundamental building blocks of all

policies.
Many variations can be built using hybrids

» Lookahead plus value function for terminal reward
» Myopic or lookahead policies with policy function
approximation
» Modified lookahead policy (lookahead with CFA)
Searching for policies
The objective function
 TT N T
1 tt  
E 
ˆ
min F 
 
 CC  
S S,
tt
t 
X
(
Ct (
),SS
X ( 
)
tt t (n
S),
t (
X 
t))
( 
S t ( n
)) 
 t tN
0 0 n 1 t  0 
St 1  S M  St , X t ( St ),Wt 1 
Finding the best policy means:

» Search over different classes of policies:
• PFAs, CFAs, VFAs and lookaheads.
• Hybrids (VFA+lookahead, CFA+PFA, …)
» Search over tunable parameters within each class.

There are tunable parameters for every class of
policy:
» PFAs – These are parametric functions characterized by
parameters such as:
•   ( s, S ) parameters for an inventory policy
• X PFA ( S )     S   S 2
t 0 1 t 2 t
» CFAs – A parameterized cost function
• Bonus and penalties to encourage certain behaviors
• Constraints to ensure buffer stocks or schedule slack
» VFAs – These may be parameterized approximations of
the value of being in a state:
• V (S )     (S )
t f f t
» Lookaheads – Choices of planning horizon, number of

f F
stages, number of scenarios, ….

Parameter tuning can be done offline or online:
» Offline (in the lab)
• Stochastic search
• Simulation optimization
• Global optimization
• Black box optimization
• Ranking and selection
• Knowledge gradient (offline)
» Online (in the field)
• Bandit problems
– Gittins indices
– Upper confidence bounding
• Online knowledge gradient
So now that complicated looking objective
function:
 T

min  E   C  St , X t ( St ) 
 t 
 t 0 
St 1  S M  St , xt ,Wt 1 ( ) 
» … becomes something real and practical (and possibly

what you are already doing).
With the right language,
we can learn how to
bridge communities,
creating practical
algorithms for a wide
range of stochastic
optimization problems.
Outline
Four basic problems
State variables
The state variables
What is a state variable?
» Bellman’s classic text on dynamic programming (1957)
describes the state variable with:
• “… we have a physical system characterized at any stage by a
small set of parameters, the state variables.”
» The most popular book on dynamic programming
(Puterman, 2005, p.18) “defines” a state variable with
the following sentence:
• “At each decision epoch, the system occupies a state.”
» Wikipedia:
• “State commonly refers to either the present condition of a
system or entity” or….
• A state variable is one of the set of variables that are used to
describe the mathematical ‘state’ of a dynamical system
The state variables
What is a state variable?
» Kirk (2004), an introduction to control theory, offers
the definition:
• A state variable is a set of quantities x (t ), x (t ),... which if
1 2
known at time t  t0 are determined for t  t0 by specifying
the inputs for the system for t  t0 .
• … or “all the information you need to model the system from
time t onward.” True, but vague.
» Cassandras and Lafortune (2008):
• The state of a system at time t0 is the information required at t
0
such that the output [cost] y(t) for all t  t0 is uniquely
determined from this information and from u (t ), t  t0 .
• Again, consistent with the statement “all the information you
need to model the system from time t0 onward,” but then why
do they later talk about “Markovian” and “non-Markovian”
queueing systems?
The state variables
There appear to be two ways of approaching a
state variable:
» The mathematician’s view – The state variable is a
given, at which point the mathematician will
characterize its properties (“Markovian,” “history-
dependent,” …)
» The modeler’s view – The state variable needs to be

constructed from a raw description of the problem.
The state variable
Illustrating state variables
» A deterministic graph
17.4 15.9
2 5 8
12.6 5.7
8.1 12.7 9.6
8.4 15.9 8.9 7.3
1 3 6 9 11
9.2 2.3
3.6 13.5 4.5
16.5 20.2
4 7 10
St  (?N t )  6
The state variable
» A stochastic graph
2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
St  ?
The state variable
» A stochastic graph
2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
 
St  ? N t , ct , Nt , j    6, 12.7,8.9,13.5 
j
The state variable
» A stochastic graph with left turn penalties
2 5 8
12.6
8.1 12.7(.7)
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
 
St  ?N t , ct , Nt , j

 j 
, N t 1   6, 12, 7,8.9,13.5  ,3 
        
Rt It
The state variable
» A stochastic graph with generalized learning
2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
St  ?
The state variable
» A stochastic graph with generalized learning
2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
 

SStt  ?N t , ct , Nt , j
    
 j
,
      

Rt It Kt
The state variable
A proposed definition of a state variable:
» The state variable is the minimally dimensioned
function of history that is necessary and sufficient to
calculate the decision function, the cost/reward
function, and the transition function, from time t
onward.
» This can be described as a “constructionist” definition,

because it specifies how to construct the state variable
from a raw description of the problem.
» Using this definition, all properly modeled problems
are Markovian!
Outline
Four basic problems
State variables
Policy function approximations
Battery arbitrage – When to charge, when to
discharge, given volatile LMPs
Grid operators require that batteries bid charge and
discharge prices, an hour in advance.
140.00
120.00
100.00
80.00
60.00
 Discharge
40.00
 Charge 20.00
0.00
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70
We have to search for the best values for the policy

parameters  Ch arg e and  Disch arg e .
Our policy function might be the parametric
model (this is nonlinear in the parameters):
1 if pt   charge

X  ( St |  )   0 if  charge  pt   discharge
1 if p   charge
 t
Finding the best policy
» We need to maximize
T
max F ( )  E   t C  St , X t ( St |  ) 
t 0
» We cannot compute the expectation, so we run simulations:
 Charge
 Discharge
A number of fields work on this problem under
different names:
» Stochastic search
» Stochastic programming (“two stage”)
» Simulation optimization
» Black box optimization
» Global optimization
» Control theory (“open loop control”)
» Sequential design of experiments
» Bandit problems (for on-line learning)
» Ranking and selection (for off-line learning)
» Optimal learning
Outline
Four basic problems
State variables
Schneider National
Slide 62
Cost function approximations
Drivers Demands
t t+1 t+2
The assignment of drivers to loads evolves over time, with new loads
being called in, along with updates to the status of a driver.
A purely myopic policy would solve this problem
using
min x  ctdl xtdl

d l
where
1 If we assign driver d to load l
xtdl  
0 Otherwise
ctdl  Cost of assigning driver d to load l at time t
What if a load it not assigned to any driver, and has been

delayed for a while? This model ignores the fact that we
eventually have to assign someone to the load.
We can minimize delayed loads by solving a
modified objective function:
min x   ctdl   tl  xtdl

d l
where
 tl  How long load l has been delayed by time t

  Bonus for moving a delayed load
We refer to our modified objective function as a cost

function approximation.
We now have to tune our policy, which we define
as:
X  ( St |  )  arg min x  ctdl   tl  xtdl

d l
We can now optimize  , another form of policy search,

by solving
T
min F  ( )  E  C ( St , X t ( St |  ))
t 0
Robust cost function approximation
Inventory management
» How much product

should I order to
anticipate future
demands?
» Need to accommodate
different sources of
uncertainty.
• Market behavior
• Transit times
• Supplier uncertainty
• Product quality
Robust cost function approximations
Imagine that we want to purchase parts from
different suppliers. Let xtp be the amount of
product we purchase at time t from supplier p to
meet forecasted demand Dt . We would solve
X t ( St )  arg min xt c
pP
x
p tp
subject to
x tp  Dt 
pP 
xtp  u p  Xt

xtp  0 
» This assumes our demand forecast D is accurate.
t
Robust cost function approximations
Imagine that we want to purchase parts from
different suppliers. Let xtp be the amount of
product we purchase at time t from supplier p to
meet forecasted demand Dt . We would solve
X t ( St |  )  arg min xt c
pP
x
p tp
subject to
Buffer inventory
x tp  Dt   
pP

xtp  u p  Xt ( )
xtp  0 

» This is a “parametric cost function approximation”
A general way of creating CFAs:
» Define our policy:
 
X t ( )  arg min x  C ( St , xt )    f  f ( St , xt ) 

 f F 
      
Sometimes mistaken as a value function
approximation, it is really a cost correction term.
» We again tune  by optimizing:

T
min F ( )  E  C ( St , X t ( ))

t 0
An even more general CFA model:
» Define our policy:
X t ( )  arg min x C  ( St , xt |  ) Parametrically

modified costs
subject to
Ax  b  ( ) Parametrically
modified constraints
» We again tune  by optimizing:

T
min F  ( )  E  C ( St , X t ( ))
t 0
Outline
Four basic problems
State variables
The locomotive assignment problem
Atlanta
Horsepower Locomotives
Baltimore
4400
4400
6000
4400 Charlotte
5700
4600
6200
Train reward
Locomotive
Horsepower Locomotives buckets
4400
4400
6000
4400
5700
4600
6200
The value of locomotives
in the future
4400
4400
6000
4400
5700
4600
6200
Locomotive subproblem can be solved quickly using Cplex

Step 1: Start with a pre-decision state Stn
Step 2: Solve the deterministic optimization using
Deterministic
an approximate value function:
optimization
n
vˆt  min x C ( St , xt )  Vt
n n 1 M ,x n
(S ( St , xt )) 
to obtain xtn.
Step 3: Update the value function approximation Recursive
Vt n1 ( Stx,1n )  (1   n 1 )Vt n11 ( Stx,1n )   n 1vˆtn statistics
Step 4: Obtain Monte Carlo sample of Wt ( n ) and
compute the next pre-decision state: Simulation
Stn1  S M ( Stn , xtn , Wt 1 ( n ))

Step 5: Return to step 1.
Iterative learning
t
Iterative learning
Iterative learning
Exploiting concavity
Derivatives are used to estimate a piecewise linear
approximation
Vt ( Rt )
Rt
Value function approximations
Objective function
1900000
1800000
1700000
Objective function
1600000
1500000
1400000
1300000
1200000
0 100 200 300 400 500 600 700 800 900 1000
Iterations
Model calibration
Deterministic vs. stochastic training
Stochastic
training
Value of locomotives at a yard
Deterministic
training
Number of locomotives
Laboratory testing
Train delay with uncertain transit times
» Stochastic training produces lower delays
Train delay using deterministically trained VFAs

Train delay
Train delay using stochastically trained VFAs
For more information see http://www.castlelab.princeton.edu/plasma.htm

Outline
Four basic problems
State variables
Lookahead policies
The ultimate lookahead policy is optimal
 T 
X ( St )  arg min C ( St , xt )  min   E  C ( St ' , X t' ( St ' )) | St , xt 
*
t
xt  t 't 1 
Minimization that we
cannot compute
Expectation that we
cannot compute
Lookahead policies
The ultimate lookahead policy is optimal
 T 
X ( St )  arg min C ( St , xt )  min   E  C ( St ' , X t' ( St ' )) | St , xt 
*
t
xt  t 't 1 
Instead, we have to solve an approximation called

the lookahead model:
 T 
X ( St )  arg min C ( St , xt )  min  E  C ( St ' , X t' ( St ' )) | St , xt 
*
t
xt  t 't 1 
» A lookahead policy works by approximating the

lookahead model.
Stochastic lookahead policies
We use a series of approximations:
» Horizon truncation – Replacing a longer horizon
problem with a shorter horizon
» Stage aggregation – Replacing multistage problems
with two-stage approximation.
» Outcome aggregation/sampling – Simplifying the
exogenous information process
» Discretization – Of time, states and decisions
» Dimensionality reduction – We may ignore some
variables (such as forecasts) in the lookahead model
that we capture in the base model (these become latent
variables in the lookahead model).
Lookahead policies
Lookahead policies are the trickiest to model:
» We create “tilde variables” for the lookahead model:
St ,t '  Approximated state variable (e.g coarse discretization)

xt ,t '  Decision we plan on implementing at time t ' when we are
planning at time t , t '  t , t  1,..., t  H
t   xt ,t , xt ,t 1 ,..., xt ,t  H 
x
Wt ,t '  Approximation of information process
ct ,t '  Forecast of costs at time t ' made at time t
bt ,t '  Forecast of right hand sides for time t ' made at time t
Lookahead policies
Deterministic lookahead
Stochastic lookahead (with two-stage

approximation)
T
X t
LA S
( St )  arg min C ( Stt , xtt )   
 
)   t ' t C ( S tt ' (), xtt ' ( ))
p (
t 't 1
t
Scenario trees
Lookahead policies
 Assume the base model has T time periods
T
Lookahead policies
 But we solve a smaller lookahead model (from t to t+H)
0 0+H
Lookahead policies
 Following a lookahead policy
1 1+H
Lookahead policies
 … which rolls forward in time.
2 2+H
Lookahead policies
3 3+H
Lookahead policies
t t+H
Lookahead policies
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model
. . . .
t t 1 t2 t 3
The real process
Lookahead policies
The lookahead model
. . . .
t t 1 t2 t 3
The real process
Lookahead policies
The lookahead model
. . . .
t t 1 t2 t 3
The real process
Lookahead policies
The lookahead model
. . . .
t t 1 t2 t 3
The real process
Modeling stochastic wind
Actual vs. forecasted energy from wind
This is our forecast ftt ' of the wind power at

time t’, made at time t.
This is the actual energy from wind, showing

the deviations from forecast.
t  Current time t '  Some point in the future

Creating wind scenarios (Scenario #1)
Stochastic lookahead
» Here, we approximate the information model by using a
Monte Carlo sample to create a scenario tree:
1am 2am 3am 4am 5am …..
Change in wind speed

Slide 109
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
Lookahead policies
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
Lookahead policies
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
Lookahead policies
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
1am 2am 3am 4am 5am …..

The two-stage approximation
1) Schedule
steam
x0
2) See wind:
3) Schedule turbines
Some common misconceptions about stochastic
programming (for sequential problems):
» Solving a “stochastic program” is hard, but getting an

optimal solution does not produce an optimal policy.
» Bounds on the quality of the solution to a stochastic

program is not a bound on the quality of the policy.
» We only care about the quality of the policy, which can

only be evaluated using a stochastic base model.
Approximating distributions
Stochastic lookahead model
» Uses approximation of the information process in the
lookahead model
Parametric distribution (pdf) Nonparametric distribution (cdf)

f ( x) F ( x)

  x   2 
 
1  2 
f ( x)  e  
2
Approximating a function
Parametric vs. nonparametric
Observations
Nonparametric fit
Total revenue
Parametric fit
True function
» Robust CFAs are parametric Price of product

» Scenario trees are nonparametric
Lookahead policies
Robust optimization
» Static robust optimization is a problem:
min x max wW F ( x, w)

» Robust optimization for sequential problems (as it is
practiced in the literature) is a lookahead policy:
X tRO ( St |  )  arg min x0 ,..., xT max w1 ,..., wT W ( ) C ( St , xt )
where St 1  S M ( St , xt , wt 1 )
The RO community tunes  using Uncertainty set
T
min F ( )  E  C  St , X tRO ( St |  ) 
t 0
Thus, they approximate both the information process (using the
uncertainty set W ( )) and the objective (max instead of E ).
This is the same as:
  
T
min  E  C  St , X ( St ) 

 t 0 
Notes:
» It is nice to talk about simulating a stochastic lookahead model
using a multistage model, but multistage models are almost
impossible to solve (we are not aware of any testing of multistage
stochastic unit commitment).
» Even two-stage approximations of lookahead models are quite
difficult for many applications, so simulating these policies remain
quite difficult, and researchers typically do not even develop a
simulator to test their policies.
» In our experience, simulations of stochastic lookahead models
tend to consist of sampling scenarios from the lookahead model.
They should be tested on a full base model.
» In a real application such as stochastic unit commitment, a number
of approximations are required in the lookahead model that should
not be present in the base model.
Outline
Four basic problems
State variables
An energy storage problem
Consider a basic energy storage problem:
» We are going to show that with minor variations in the

characteristics of this problem, we can make each class
of policy work best.
A model of our problem
» State variables
» Decision variables
» Transition function
» Objective function
State variables
L
E B
» We will present the full model, accumulating the

information we need in the state variable.
» We will highlight information we need as we proceed.
This information will make up our state variable.
Decision variables
L
E B
xt   xtEL , xtEB , xtGL , xtGB , xtBL , 

» Constraints;
» Policy: Might be lookahead using forecasts f L  ( f L )

t tt ' t 't
Exogenous information
L
E B
Eˆ t  Change in energy from wind between t  1 and t

 tp  Noise in the price process between t  1 and t
Wt  Dˆ t  Change in load between t  1 and t
f ttL'  Forecast of load Dtload
' provided by vendor at time t
f t L   f ttL' 
t 't
Transition function
L
E B
Et 1  Et  Eˆ t 1
pt 1   0 pt  1 pt 1   2 pt  2   tp1
D  D  Dˆ
t 1 t t 1
f t L  Provided exogenously
Rtbattery
1  Rt
battery
 xt
Objective function
L
E B
C ( St , xt )  pt  xtGB  xtGL 
T
min  E  C ( St , X t ( St ))
t 0
State variables
» Cost function St  Rt , Et , Lt , ( pt , pt 1 , pt  2 ), f t L 
pt  Price of electricity
» Decision function
Constraints:
f t L  Needed if we use a lookahead policy

» Transition function
pt 1   0 pt  1 pt 1   2 pt  2   tp1
We can create distinct flavors of this problem:
» Problem class 1 – Best for PFAs
• Highly stochastic (heavy tailed) electricity prices
• Stationary data
» Problem class 2 – Best for CFAs
• Stochastic prices and wind (but not heavy tailed)
• Stationary data
» Problem class 3 - Best for VFAs
• Stochastic wind and prices (but not too random)
• Time varying loads, but inaccurate wind forecasts
» Problem class 4 – Best for deterministic lookaheads
• Relatively low noise problem with accurate forecasts
» Problem class 5 – A hybrid policy worked best here
• Stochastic prices and wind, nonstationary data, noisy forecasts.
The policies
» The PFA:
• Charge battery when price is below p1
• Discharge when price is above p2
» The CFA
• Minimize a cost with an error correction term.
» The VFA
• Piecewise linear, concave value function in terms of energy,
indexed by time.
» The lookahead (deterministic)
• Optimize over a horizon H (only tunable parameter) using
forecasts of demand, prices and wind energy
» A hybrid lookahead CFA
• Deterministic lookahead with bounds on inventories in the
future.
Each policy is best on certain problems
» Results are percent of posterior optimal solution
» … any policy might be best depending on the data.
Joint research with Prof. Stephan Meisel, University of Muenster, Germany.

Outline
Four basic problems
State variables
Choosing a policy
 Robust cost function

approximation
 Lookahead policy
 Policy function
approximation
 Policy based on value

function approximation
Choosing a policy
Which policy to use?
» PFAs are best for low-
dimensional problems
where the structure of the
policy is apparent from the
problem.
» CFAs work for high-

dimensional problems,
where we can get desired X t ( )  arg min x   ctdl   l  xtdl
d l
behavior by manipulating
the cost function.
Choosing a policy
Which policy to use?
» VFAs work best when the
lookahead model is easy to
approximate
» Lookahead models should

be used only when all else
fails (which is often)
First build your model Then design your policies:
» Objective function » PFA? Exploit obvious
T t  problem structure.
min  E   C  St , X t ( St ) 
 
 t 0  » CFA? Can you tune a

» Policy deterministic approximation
X :S  X to make it work better?
» Constraints at time t
» VFA? Can you approximate
the value of being in a
xt  X t ( St )  Xt downstream state?
» Transition function » Lookahead? Do you have a

forecast? What is the nature
St 1  S M  St , xt ,Wt 1  of the uncertainty?
» Hybrid?
(W1 , W2 ,..., WT )
Robust optimization
Optimal control
Online learning

Worst-case lookahead
VFA-based policies
Deterministic lookahead
Parametric VFA
Myopic policy
VFA-based policy with discrete actions
Exact lookup table

Computational Stochastic
Optimization
Robust optimization
Optimal control
Online learning
Thank you!
For a related tutorial, go to:
http://www.castlelab.princeton.edu
and click on the link “Clearing the Jungle

of Stochastic Optimization”
http://www.castlelab.princeton.edu/jungle.htm

Powell-Tutorial-ComputationalStochasticOptimization_Informs_Nov152014

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Powell-Tutorial-ComputationalStochasticOptimization_Informs_Nov152014

Uploaded by

Copyright:

Available Formats

Clearing the Jungle of Stochastic Optimization

Informs Annual Meeting

© 2014 Warren B. Powell, Princeton University

… but sequential problems are a different matter.

In this tutorial, we are going to present a canonical

Markov decision processes

Markov decision processes

» …One of the main contributions of the paper is the

W. B. Powell, H. Simao, B. Bouzaiene-Ayari, “Approximate Dynamic Programming in

» Standard model in optimal

» Bundles objective function

» This framework is widely followed in the control

For many applications, these equations are unknown.

We call this the base model.

min x  r F ( x, W )    Convex/coherent risk measures

The choice of objective is up to the modeler.

With stochastic problems, we want to find the best

… pretty to look at, but no impact.

» How do we compute the expectation?

» How do we search over policies?

• Provides controlled • No control over test

A policy is a mapping from a state to an action.

How do we search over an arbitrary space of

3) Policies based on value function approximations (VFAs)

» These are the fundamental building blocks of all

Many variations can be built using hybrids

Finding the best policy means:

» Search over tunable parameters within each class.

» Lookaheads – Choices of planning horizon, number of

stages, number of scenarios, ….

» … becomes something real and practical (and possibly

» The modeler’s view – The state variable needs to be

» This can be described as a “constructionist” definition,

We have to search for the best values for the policy

» We cannot compute the expectation, so we run simulations:

min x  ctdl xtdl

What if a load it not assigned to any driver, and has been

min x   ctdl   tl  xtdl

 tl  How long load l has been delayed by time t

We refer to our modified objective function as a cost

X  ( St |  )  arg min x  ctdl   tl  xtdl

We can now optimize  , another form of policy search,

» How much product

» We again tune  by optimizing:

X t ( )  arg min x C  ( St , xt |  ) Parametrically

» We again tune  by optimizing:

Locomotive subproblem can be solved quickly using Cplex

Stn1  S M ( Stn , xtn , Wt 1 ( n ))

Train delay using deterministically trained VFAs

Train delay using stochastically trained VFAs

For more information see http://www.castlelab.princeton.edu/plasma.htm

Instead, we have to solve an approximation called

» A lookahead policy works by approximating the

St ,t '  Approximated state variable (e.g coarse discretization)

Stochastic lookahead (with two-stage

This is our forecast ftt ' of the wind power at

This is the actual energy from wind, showing

t  Current time t '  Some point in the future

Change in wind speed

Change in wind speed

Change in wind speed

1am 2am 3am 4am 5am …..

Change in wind speed

Change in wind speed