Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 142

Clearing the Jungle of Stochastic Optimization

Informs Annual Meeting


San Francisco
November 10, 2014

Warren B. Powell
Princeton University
Department of Operations Research
and Financial Engineering

© 2014 Warren B. Powell, Princeton University


Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
Schneider National
Optimizing energy storage
Take advantage of price variations
Applications
All of these problems are examples of sequential decision
problems.
We are going to argue that we do a very good job of
modeling deterministic optimization problems, and
static/two-stage stochastic optimization problems.

… but sequential problems are a different matter.

In this tutorial, we are going to present a canonical


framework that spans all the communities that work on
this problem class.
… and we are going to see that these four problems
illustrate four fundamental classes of policies for
sequential stochastic optimization.
Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
Modeling stochastic, dynamic problems
Before we can solve complex problems, we have
to know how to think about them.

Min E {cx}
Ax = b
x>0
Organize class
libraries, and set up
communications and
databases

Mathematician

Software
The biggest challenge when making decisions
under uncertainty is modeling.
Deterministic modeling
For deterministic problems, we speak the language
of mathematical programming
» For static problems
min cx Arguably Dantzig’s biggest
Ax  b contribution, more so than
x0 the simplex algorithm, was
» For time-staged problems his articulation of
T
optimization problems in a
min  ct xt standard format, which has
t 0
given algorithmic
At xt  Bt 1 xt 1  bt researchers a common
Dt xt  ut language.
xt  0
Stochastic programming
Robust optimization
Approximate dynamic programming
Model predictive control
Optimal control
Online learning
Reinforcement learning

Markov decision processes


Stochastic programming
Robust optimization
Approximate dynamic programming
Model predictive control
Optimal control
Online learning
Reinforcement learning

Markov decision processes


Modeling
A recent comment by a (helpful) referee:

» …One of the main contributions of the paper is the


demonstration of a policy-based modeling framework for
transportation problems with uncertainty. However, it
could be argued that a richer modeling framework already
exists (multi-stage stochastic programming) that does not
require approximating the decision space with policies….

W. B. Powell, H. Simao, B. Bouzaiene-Ayari, “Approximate Dynamic Programming in


Transportation and Logistics: A Unified Framework,” European J. on Transportation and
Logistics, Vol. 1, No. 3, pp. 237-284 (2012). DOI 10.1007/s13676-012-0015-8.
Modeling as a Markov decision process
For stochastic problems, many people model the
problem using Bellman’s equation
 
V ( s )  min a  C ( s, a )    p ( s ' | s, a)V ( s ') 
 s' 
where
s  "State variable"
a  Discrete action
p( s ' | s, a)  "Model" (transition matrix, transition kernel)
V ( s )  Value of being in state s
  Discount factor
» This is the canonical form of a dynamic program
building on Bellman’s seminal research. Simple,
elegant, widely used but difficult to scale to realistic
problems.
Modeling as a Markov decision process
“Canonical model” from Puterman (Ch. 3)
Modeling as a stochastic program
An alternative strategy is to use the vocabulary of
“stochastic programming.”

min x0 X 0 c0 x0  E Q( x0 , 1 )
where

Q( x0 , 1 ( ))  min x1 ( )X1 ( ) c1 ( ) x1 ( )
» This is the canonical form of stochastic programming,
which might also be written over multiple periods:
T
min c0 x0   p ( ) ct ( ) xt ( )
 t 1
Modeling as a stochastic program
An alternative strategy is to use the vocabulary of
“stochastic programming.”

min xt X t ct xt  E Q ( xt ,  t 1 )
where

Q( xt , t 1 ( ))  min xt 1 ( )X t 1 ( ) ct 1 ( ) xt 1 ( )
» This is the canonical form of stochastic programming,
which might also be written over multiple periods:
tH
min ct xt 

 p (t )  ctt ' ( ) xtt ' ( )
t t t 't 1
Modeling using control theory
From “Optimal Control” by
Lewis, Vrabie and Syrmos

» Standard model in optimal


control is deterministic.

» Bundles objective function


with optimality criteria
comparable to Bellman
equations.
Modeling
We lack a standard language for modeling
sequential, stochastic decision problems.
» In the slides that follow, we propose to model problems
along five fundamental dimensions:

• State variables
• Decision variables
• Exogenous information
• Transition function
• Objective function

» This framework is widely followed in the control


theory community, and almost completely ignored in
operations research and computer science.
Modeling dynamic problems
The system state:
Controls community
xt  "Information state"
 Operations research/MDP/Computer science
 St   Rt , I t , K t   System state, where:
 Rt  Resource state (physical state)
 Location/status of truck/train/plane
 Energy in storage
 I t  Information state
 Prices
 Weather
 K t  Knowledge state ("belief state")
Belief about traffic delays
Belief about the status of equipment
Modeling dynamic problems
Decisions:
Computer science
 at  Discrete action

 Control theory
ut  Low-dimensional continuous vector
 Operations research
 xt  Usually a discrete or continuous but high-dimensional
 vector of decisions.
 At this point, we do not specify how to make a decision.
 Instead, we define the function X  ( s) (or A ( s ) or U  ( s )),
 where  specifies the type of policy. " " carries information
 about the type of function f , and any tunable parameters    f .
Modeling dynamic problems
Exogenous information:
Wt  New information that first became known at time t
 
= Rˆt , Dˆ t , pˆ t , Eˆ t 
 Rˆt  Equipment failures, delays, new arrivals
 New drivers being hired to the network
 Dˆ t  New customer demands
 pˆ t  Changes in prices
 Eˆ t  Information about the environment (temperature, ...)
 Note: Any variable indexed by t is known at time t. This
 convention, which is not standard in control theory,
dramatically simplifies the modeling of information.
 Below, we will let  represent a sequence of actual observations W1 , W2 ,.... Wt  
refers to a sample realization of the random variable Wt .
Modeling dynamic problems
The transition function
St 1  S M ( St , xt , Wt 1 )
 R  R  x  Rˆ Inventories
 t 1

pt 1  pt
t


t t 1

pˆ t 1 Spot prices

Dt 1  Dt  Dˆ
 t 1 Market demands
 Also known as the:
 “System model” “Transfer function”
 “State transition model”
“Plant model”
“Transformation function”
“Law of motion”
 “Plant equation” “Model”
 “Transition law”

For many applications, these equations are unknown.


This is known as “model-free” dynamic programming.
Stochastic optimization models
The objective function

 T

min  E   C  St , X t ( St ) 
 t 

 t 0 
Expectation over all
Cost function
random outcomes
State variable Decision function (policy)
Finding the best policy
Given a system model (transition function)

St 1  S M  St , xt ,Wt 1 ( ) 

We call this the base model.


Objective functions
There are different objectives that we can:
» Expectations

min x E F ( x, W )
» Risk measures

min x E F ( x, W )   E  F ( x, W )  f 
2

min x  r F ( x, W )    Convex/coherent risk measures


» Worst case (“robust optimization”)

min x max w F ( x, w)

The choice of objective is up to the modeler.


Modeling
Deterministic Stochastic
» Objective function » Objective function
T t 
min  E   C  St , X t ( St ) 
T
min
x0 ,..., xT
c x
t 0
t t

 t 0


» Decision variables: » Policy
 x0 ,..., xT  X :S  X
» Constraints: » Constraints at time t
• at time t
At xt  Rt  xt  X t ( St )  Xt
X
xt  0  t

• Transition function » Transition function

Rt 1  bt 1  Bt xt St 1  S M  St , xt ,Wt 1 
» Exogenous information

(W1 , W2 ,..., WT )
Stochastic optimization models
With deterministic problems, we want to find the
best decision:
T
min x0 ,..., xT c x
t 0
t t

With stochastic problems, we want to find the best


function (policy) for making a decision:
 T

min  E    t C  St , X t ( St ) 
 t 0 
» … which is sometimes written
 T

min x0 ,..., xT E   C  St , xt 
 t

 t 0 
where xt is F t  measurable.
A model without an algorithm is like cloud to cloud
lightning…

… pretty to look at, but no impact.


Modeling
There are two practical issues working with this
objective function

» How do we compute the expectation?

» How do we search over policies?


Computing the objective function
In practice, we cannot compute the expectation in:
 T

min  E   C  St , X ( St ) 
 t 

 t 0 
Instead, we might do one long simulation…
T
min  Fˆ     t C  St ( ), X t ( St ( )) 
t 0
…or we might average across several simulations:
N T
1

min  F 
N
  t
 t
C S (
n 1 t  0
 n
), X 
t ( S t ( n
)) 
Modeling
There are two ways to compute the objective:
 T

min  E    t C  St , X  ( St ) 
 t 0 
» Offline learning » Online learning
• We use computer • We observe the
simulation to compute performance of a policy
T in the field.
Fˆ    t C  St ( ), X t ( St ( )) 

t 0

• Provides controlled • No control over test


testing environment. environment.
• Requires living with • Avoids dependence on
model assumptions. model assumptions.
• Can quickly compare • Policy evaluations are
different policies. quite slow.
Searching for a policy
We have to start by describing what we mean by a
policy.
» Definition:

A policy is a mapping from a state to an action.


… any mapping.

How do we search over an arbitrary space of


policies?
» Scanning the literature, it appears that every algorithmic
strategy can be boiled down to four fundamental classes:
Four (meta)classes of policies
1) Policy function approximations (PFAs)
» Lookup tables, rules, parametric functions
2) Cost function approximation (CFAs)
» X CFA ( S |  )  arg min  C  ( S , x |  )
t x X ( ) t t
t t

3) Policies based on value function approximations (VFAs)


xt t t 
» X VFA ( S )  arg min C ( S , x )   V x S x ( S , x )
t t t t t t  
4) Lookahead policies
» Deterministic lookahead:
T
X ( St )  arg min C ( Stt , xtt )   C ( Stt ' , xtt ' )
t
LA  D
xtt ,..., xt ,t  H
t 't 1
» Stochastic lookahead (e.g. scenario trees)
T
X t
LA  S
( St )  arg min C ( Stt , xtt ) 

 
 
) 
p(
t ' t 1
t ' t
C ( Stt ' (), xtt ' ( ))
t
» “Robust optimization”
T
X t
LA  RO
( St )  arg min max C ( Stt , xtt ) 
xtt ,..., xt ,t  H wWt ( )
 C (S
t 't 1
tt ' ( w), xtt ' ( w))
Function approximations
There are three classes of
approximation strategies
» Lookup table
• Given a discrete state, return a discrete
action or value 1 2 3 4 5

» Parametric models
• Linear models (“basis functions”)
• Nonlinear models

» Nonparametric models
• Kernel regression
• Nearest neighbor clustering
• Local polynomial regression
Policies
(Slightly arrogant) claim:

» These are the fundamental building blocks of all


policies.

Many variations can be built using hybrids


» Lookahead plus value function for terminal reward
» Myopic or lookahead policies with policy function
approximation
» Modified lookahead policy (lookahead with CFA)
Searching for policies
The objective function
 TT N T
1 tt  
E 
ˆ
min F 
 
 CC  
S S,
tt
t 
X
(
Ct (
),SS
X ( 
)
tt t (n
S),
t (
X 
t))
( 
S t ( n
)) 
 t tN
0 0 n 1 t  0 
St 1  S M  St , X t ( St ),Wt 1 

Finding the best policy means:


» Search over different classes of policies:
• PFAs, CFAs, VFAs and lookaheads.
• Hybrids (VFA+lookahead, CFA+PFA, …)

» Search over tunable parameters within each class.


Searching for policies
There are tunable parameters for every class of
policy:
» PFAs – These are parametric functions characterized by
parameters such as:
•   ( s, S ) parameters for an inventory policy
• X PFA ( S )     S   S 2
t 0 1 t 2 t
» CFAs – A parameterized cost function
• Bonus and penalties to encourage certain behaviors
• Constraints to ensure buffer stocks or schedule slack
» VFAs – These may be parameterized approximations of
the value of being in a state:
• V (S )     (S )
t f f t

» Lookaheads – Choices of planning horizon, number of


f F

stages, number of scenarios, ….


Searching for policies
Parameter tuning can be done offline or online:
» Offline (in the lab)
• Stochastic search
• Simulation optimization
• Global optimization
• Black box optimization
• Ranking and selection
• Knowledge gradient (offline)
» Online (in the field)
• Bandit problems
– Gittins indices
– Upper confidence bounding
• Online knowledge gradient
Stochastic optimization models
So now that complicated looking objective
function:

 T

min  E   C  St , X t ( St ) 
 t 

 t 0 
St 1  S M  St , xt ,Wt 1 ( ) 

» … becomes something real and practical (and possibly


what you are already doing).
With the right language,
we can learn how to
bridge communities,
creating practical
algorithms for a wide
range of stochastic
optimization problems.
Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
The state variables
What is a state variable?
» Bellman’s classic text on dynamic programming (1957)
describes the state variable with:
• “… we have a physical system characterized at any stage by a
small set of parameters, the state variables.”
» The most popular book on dynamic programming
(Puterman, 2005, p.18) “defines” a state variable with
the following sentence:
• “At each decision epoch, the system occupies a state.”
» Wikipedia:
• “State commonly refers to either the present condition of a
system or entity” or….
• A state variable is one of the set of variables that are used to
describe the mathematical ‘state’ of a dynamical system
The state variables
What is a state variable?
» Kirk (2004), an introduction to control theory, offers
the definition:
• A state variable is a set of quantities x (t ), x (t ),... which if
1 2
known at time t  t0 are determined for t  t0 by specifying
the inputs for the system for t  t0 .
• … or “all the information you need to model the system from
time t onward.” True, but vague.
» Cassandras and Lafortune (2008):
• The state of a system at time t0 is the information required at t
0
such that the output [cost] y(t) for all t  t0 is uniquely
determined from this information and from u (t ), t  t0 .
• Again, consistent with the statement “all the information you
need to model the system from time t0 onward,” but then why
do they later talk about “Markovian” and “non-Markovian”
queueing systems?
The state variables
There appear to be two ways of approaching a
state variable:
» The mathematician’s view – The state variable is a
given, at which point the mathematician will
characterize its properties (“Markovian,” “history-
dependent,” …)

» The modeler’s view – The state variable needs to be


constructed from a raw description of the problem.
The state variable
Illustrating state variables
» A deterministic graph

17.4 15.9
2 5 8
12.6 5.7
8.1 12.7 9.6
8.4 15.9 8.9 7.3
1 3 6 9 11
9.2 2.3
3.6 13.5 4.5
16.5 20.2
4 7 10

St  (?N t )  6
The state variable
Illustrating state variables
» A stochastic graph

2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

St  ?
The state variable
Illustrating state variables
» A stochastic graph

2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

 
St  ? N t , ct , Nt , j    6, 12.7,8.9,13.5 
j
The state variable
Illustrating state variables
» A stochastic graph with left turn penalties

2 5 8
12.6
8.1 12.7(.7)
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

 
St  ?N t , ct , Nt , j

 j 
, N t 1   6, 12, 7,8.9,13.5  ,3 
        
Rt It
The state variable
Illustrating state variables
» A stochastic graph with generalized learning

2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

St  ?
The state variable
Illustrating state variables
» A stochastic graph with generalized learning

2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

 

SStt  ?N t , ct , Nt , j
    
 j
,
      

Rt It Kt
The state variable
A proposed definition of a state variable:
» The state variable is the minimally dimensioned
function of history that is necessary and sufficient to
calculate the decision function, the cost/reward
function, and the transition function, from time t
onward.

» This can be described as a “constructionist” definition,


because it specifies how to construct the state variable
from a raw description of the problem.
» Using this definition, all properly modeled problems
are Markovian!
Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
Policy function approximations
Battery arbitrage – When to charge, when to
discharge, given volatile LMPs
Policy function approximations
Grid operators require that batteries bid charge and
discharge prices, an hour in advance.
140.00

120.00

100.00

80.00

60.00

 Discharge
40.00

 Charge 20.00

0.00
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70

We have to search for the best values for the policy


parameters  Ch arg e and  Disch arg e .
Policy function approximations
Our policy function might be the parametric
model (this is nonlinear in the parameters):
1 if pt   charge

X  ( St |  )   0 if  charge  pt   discharge
1 if p   charge
 t
Policy function approximations
Finding the best policy
» We need to maximize
T
max F ( )  E   t C  St , X t ( St |  ) 
t 0

» We cannot compute the expectation, so we run simulations:

 Charge
 Discharge
Policy function approximations
Policy function approximations
A number of fields work on this problem under
different names:
» Stochastic search
» Stochastic programming (“two stage”)
» Simulation optimization
» Black box optimization
» Global optimization
» Control theory (“open loop control”)
» Sequential design of experiments
» Bandit problems (for on-line learning)
» Ranking and selection (for off-line learning)
» Optimal learning
Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
Schneider National
Slide 62
Cost function approximations
Drivers Demands
Cost function approximations

t t+1 t+2

The assignment of drivers to loads evolves over time, with new loads
being called in, along with updates to the status of a driver.
Cost function approximations
A purely myopic policy would solve this problem
using

min x  ctdl xtdl


d l
where
1 If we assign driver d to load l
xtdl  
0 Otherwise
ctdl  Cost of assigning driver d to load l at time t

What if a load it not assigned to any driver, and has been


delayed for a while? This model ignores the fact that we
eventually have to assign someone to the load.
Cost function approximations
We can minimize delayed loads by solving a
modified objective function:

min x   ctdl   tl  xtdl


d l
where

 tl  How long load l has been delayed by time t


  Bonus for moving a delayed load

We refer to our modified objective function as a cost


function approximation.
Cost function approximations
We now have to tune our policy, which we define
as:

X  ( St |  )  arg min x  ctdl   tl  xtdl


d l

We can now optimize  , another form of policy search,


by solving
T
min F  ( )  E  C ( St , X t ( St |  ))
t 0
Robust cost function approximation
Inventory management

» How much product


should I order to
anticipate future
demands?

» Need to accommodate
different sources of
uncertainty.
• Market behavior
• Transit times
• Supplier uncertainty
• Product quality
Robust cost function approximations
Imagine that we want to purchase parts from
different suppliers. Let xtp be the amount of
product we purchase at time t from supplier p to
meet forecasted demand Dt . We would solve
X t ( St )  arg min xt c
pP
x
p tp

subject to

x tp  Dt 
pP 
xtp  u p  Xt

xtp  0 
» This assumes our demand forecast D is accurate.
t
Robust cost function approximations
Imagine that we want to purchase parts from
different suppliers. Let xtp be the amount of
product we purchase at time t from supplier p to
meet forecasted demand Dt . We would solve
X t ( St |  )  arg min xt c
pP
x
p tp

subject to
Buffer inventory
x tp  Dt   
pP

xtp  u p  Xt ( )
xtp  0 

» This is a “parametric cost function approximation”
Cost function approximations
A general way of creating CFAs:
» Define our policy:
 
X t ( )  arg min x  C ( St , xt )    f  f ( St , xt ) 

 f F 
      
Sometimes mistaken as a value function
approximation, it is really a cost correction term.

» We again tune  by optimizing:


T
min F ( )  E  C ( St , X t ( ))

t 0
Cost function approximations
An even more general CFA model:
» Define our policy:

X t ( )  arg min x C  ( St , xt |  ) Parametrically


modified costs
subject to

Ax  b  ( ) Parametrically
modified constraints

» We again tune  by optimizing:


T
min F  ( )  E  C ( St , X t ( ))
t 0
Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
The locomotive assignment problem

Atlanta
Horsepower Locomotives
Baltimore
4400
4400
6000
4400 Charlotte
5700
4600
6200
The locomotive assignment problem

Train reward
Locomotive
Horsepower Locomotives buckets

4400
4400
6000
4400
5700
4600
6200
The value of locomotives
in the future
The locomotive assignment problem

4400
4400
6000
4400
5700
4600
6200

Locomotive subproblem can be solved quickly using Cplex


Approximate dynamic programming
Step 1: Start with a pre-decision state Stn
Step 2: Solve the deterministic optimization using
Deterministic
an approximate value function:
optimization
n
vˆt  min x C ( St , xt )  Vt
n n 1 M ,x n
(S ( St , xt )) 
to obtain xtn.
Step 3: Update the value function approximation Recursive
Vt n1 ( Stx,1n )  (1   n 1 )Vt n11 ( Stx,1n )   n 1vˆtn statistics
Step 4: Obtain Monte Carlo sample of Wt ( n ) and
compute the next pre-decision state: Simulation

Stn1  S M ( Stn , xtn , Wt 1 ( n ))


Step 5: Return to step 1.
Iterative learning
Approximate dynamic programming
t
Iterative learning
Approximate dynamic programming
Iterative learning
Approximate dynamic programming
Exploiting concavity
Derivatives are used to estimate a piecewise linear
approximation

Vt ( Rt )

Rt
Value function approximations
Objective function
1900000

1800000

1700000
Objective function

1600000

1500000

1400000

1300000

1200000
0 100 200 300 400 500 600 700 800 900 1000

Iterations
Model calibration
Deterministic vs. stochastic training

Stochastic
training
Value of locomotives at a yard

Deterministic
training

Number of locomotives
Laboratory testing
Train delay with uncertain transit times
» Stochastic training produces lower delays

Train delay using deterministically trained VFAs


Train delay

Train delay using stochastically trained VFAs

For more information see http://www.castlelab.princeton.edu/plasma.htm


Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
Lookahead policies
The ultimate lookahead policy is optimal
 T 
X ( St )  arg min C ( St , xt )  min   E  C ( St ' , X t' ( St ' )) | St , xt 
*
t
xt  t 't 1 

Minimization that we
cannot compute

Expectation that we
cannot compute
Lookahead policies
The ultimate lookahead policy is optimal
 T 
X ( St )  arg min C ( St , xt )  min   E  C ( St ' , X t' ( St ' )) | St , xt 
*
t
xt  t 't 1 

Instead, we have to solve an approximation called


the lookahead model:

 T 
X ( St )  arg min C ( St , xt )  min  E  C ( St ' , X t' ( St ' )) | St , xt 
*
t
xt  t 't 1 

» A lookahead policy works by approximating the


lookahead model.
Stochastic lookahead policies
We use a series of approximations:
» Horizon truncation – Replacing a longer horizon
problem with a shorter horizon
» Stage aggregation – Replacing multistage problems
with two-stage approximation.
» Outcome aggregation/sampling – Simplifying the
exogenous information process
» Discretization – Of time, states and decisions
» Dimensionality reduction – We may ignore some
variables (such as forecasts) in the lookahead model
that we capture in the base model (these become latent
variables in the lookahead model).
Lookahead policies
Lookahead policies are the trickiest to model:
» We create “tilde variables” for the lookahead model:

St ,t '  Approximated state variable (e.g coarse discretization)


xt ,t '  Decision we plan on implementing at time t ' when we are
planning at time t , t '  t , t  1,..., t  H
t   xt ,t , xt ,t 1 ,..., xt ,t  H 
x
Wt ,t '  Approximation of information process
ct ,t '  Forecast of costs at time t ' made at time t
bt ,t '  Forecast of right hand sides for time t ' made at time t
Lookahead policies
Deterministic lookahead

Stochastic lookahead (with two-stage


approximation)
T
X t
LA S
( St )  arg min C ( Stt , xtt )   
 
)   t ' t C ( S tt ' (), xtt ' ( ))
p (
t 't 1
t

Scenario trees
Lookahead policies
 Assume the base model has T time periods
T
Lookahead policies
 But we solve a smaller lookahead model (from t to t+H)
0 0+H
Lookahead policies
 Following a lookahead policy
1 1+H
Lookahead policies
 … which rolls forward in time.
2 2+H
Lookahead policies
 … which rolls forward in time.
3 3+H
Lookahead policies
 … which rolls forward in time.
t t+H
Lookahead policies
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model

. . . .

t t 1 t2 t 3
The real process
Lookahead policies
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model

. . . .

t t 1 t2 t 3
The real process
Lookahead policies
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model

. . . .

t t 1 t2 t 3
The real process
Lookahead policies
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model

. . . .

t t 1 t2 t 3
The real process
Modeling stochastic wind
Actual vs. forecasted energy from wind

This is our forecast ftt ' of the wind power at


time t’, made at time t.

This is the actual energy from wind, showing


the deviations from forecast.

t  Current time t '  Some point in the future


Modeling stochastic wind
Creating wind scenarios (Scenario #1)
Modeling stochastic wind
Creating wind scenarios (Scenario #2)
Modeling stochastic wind
Creating wind scenarios (Scenario #3)
Stochastic lookahead policies
Creating wind scenarios (Scenario #4)
Stochastic lookahead policies
Creating wind scenarios (Scenario #5)
Stochastic lookahead policies
Stochastic lookahead
» Here, we approximate the information model by using a
Monte Carlo sample to create a scenario tree:
1am 2am 3am 4am 5am …..

Change in wind speed

Change in wind speed

Change in wind speed


Slide 109
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model

. . . .

t t 1 t2 t 3
The base model
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model

. . . .

t t 1 t2 t 3
The base model
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model

. . . .

t t 1 t2 t 3
The base model
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model

. . . .

t t 1 t2 t 3
The base model
Stochastic lookahead policies
Stochastic lookahead

1am 2am 3am 4am 5am …..

Change in wind speed

Change in wind speed

Change in wind speed


Stochastic lookahead policies
The two-stage approximation
1) Schedule
steam
x0

2) See wind:

3) Schedule turbines
Stochastic lookahead policies
Some common misconceptions about stochastic
programming (for sequential problems):

» Solving a “stochastic program” is hard, but getting an


optimal solution does not produce an optimal policy.

» Bounds on the quality of the solution to a stochastic


program is not a bound on the quality of the policy.

» We only care about the quality of the policy, which can


only be evaluated using a stochastic base model.
Approximating distributions
Stochastic lookahead model
» Uses approximation of the information process in the
lookahead model

Parametric distribution (pdf) Nonparametric distribution (cdf)


f ( x) F ( x)


  x   2 
 
1  2 
f ( x)  e  

2
Approximating a function
Parametric vs. nonparametric

Observations
Nonparametric fit
Total revenue

Parametric fit
True function

» Robust CFAs are parametric Price of product


» Scenario trees are nonparametric
Lookahead policies
Robust optimization
» Static robust optimization is a problem:

min x max wW F ( x, w)


» Robust optimization for sequential problems (as it is
practiced in the literature) is a lookahead policy:
X tRO ( St |  )  arg min x0 ,..., xT max w1 ,..., wT W ( ) C ( St , xt )
where St 1  S M ( St , xt , wt 1 )
The RO community tunes  using Uncertainty set
T
min F ( )  E  C  St , X tRO ( St |  ) 
t 0
Thus, they approximate both the information process (using the
uncertainty set W ( )) and the objective (max instead of E ).
This is the same as:
  
T
min  E  C  St , X ( St ) 

 t 0 
Stochastic lookahead policies
Notes:
» It is nice to talk about simulating a stochastic lookahead model
using a multistage model, but multistage models are almost
impossible to solve (we are not aware of any testing of multistage
stochastic unit commitment).
» Even two-stage approximations of lookahead models are quite
difficult for many applications, so simulating these policies remain
quite difficult, and researchers typically do not even develop a
simulator to test their policies.
» In our experience, simulations of stochastic lookahead models
tend to consist of sampling scenarios from the lookahead model.
They should be tested on a full base model.
» In a real application such as stochastic unit commitment, a number
of approximations are required in the lookahead model that should
not be present in the base model.
Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
An energy storage problem
Consider a basic energy storage problem:

» We are going to show that with minor variations in the


characteristics of this problem, we can make each class
of policy work best.
An energy storage problem
A model of our problem
» State variables
» Decision variables
» Exogenous information
» Transition function
» Objective function
An energy storage problem
State variables
L
E B

» We will present the full model, accumulating the


information we need in the state variable.
» We will highlight information we need as we proceed.
This information will make up our state variable.
An energy storage problem
Decision variables
L
E B

xt   xtEL , xtEB , xtGL , xtGB , xtBL , 


» Constraints;

» Policy: Might be lookahead using forecasts f L  ( f L )


t tt ' t 't
An energy storage problem
Exogenous information
L
E B

Eˆ t  Change in energy from wind between t  1 and t


 tp  Noise in the price process between t  1 and t
Wt  Dˆ t  Change in load between t  1 and t
f ttL'  Forecast of load Dtload
' provided by vendor at time t
f t L   f ttL' 
t 't
An energy storage problem
Transition function
L
E B

Et 1  Et  Eˆ t 1
pt 1   0 pt  1 pt 1   2 pt  2   tp1
D  D  Dˆ
t 1 t t 1

f t L  Provided exogenously
Rtbattery
1  Rt
battery
 xt
An energy storage problem
Objective function
L
E B

C ( St , xt )  pt  xtGB  xtGL 
T
min  E  C ( St , X t ( St ))
t 0
An energy storage problem
State variables
» Cost function St  Rt , Et , Lt , ( pt , pt 1 , pt  2 ), f t L 
pt  Price of electricity
» Decision function
Constraints:

f t L  Needed if we use a lookahead policy


» Transition function
pt 1   0 pt  1 pt 1   2 pt  2   tp1
An energy storage problem
We can create distinct flavors of this problem:
» Problem class 1 – Best for PFAs
• Highly stochastic (heavy tailed) electricity prices
• Stationary data
» Problem class 2 – Best for CFAs
• Stochastic prices and wind (but not heavy tailed)
• Stationary data
» Problem class 3 - Best for VFAs
• Stochastic wind and prices (but not too random)
• Time varying loads, but inaccurate wind forecasts
» Problem class 4 – Best for deterministic lookaheads
• Relatively low noise problem with accurate forecasts
» Problem class 5 – A hybrid policy worked best here
• Stochastic prices and wind, nonstationary data, noisy forecasts.
An energy storage problem
The policies
» The PFA:
• Charge battery when price is below p1
• Discharge when price is above p2
» The CFA
• Minimize a cost with an error correction term.
» The VFA
• Piecewise linear, concave value function in terms of energy,
indexed by time.
» The lookahead (deterministic)
• Optimize over a horizon H (only tunable parameter) using
forecasts of demand, prices and wind energy
» A hybrid lookahead CFA
• Deterministic lookahead with bounds on inventories in the
future.
An energy storage problem
Each policy is best on certain problems
» Results are percent of posterior optimal solution

» … any policy might be best depending on the data.

Joint research with Prof. Stephan Meisel, University of Muenster, Germany.


Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
Choosing a policy

 Robust cost function


approximation

 Lookahead policy

 Policy function
approximation

 Policy based on value


function approximation
Choosing a policy
Which policy to use?
» PFAs are best for low-
dimensional problems
where the structure of the
policy is apparent from the
problem.

» CFAs work for high-


dimensional problems,
where we can get desired X t ( )  arg min x   ctdl   l  xtdl
d l
behavior by manipulating
the cost function.
Choosing a policy
Which policy to use?
» VFAs work best when the
lookahead model is easy to
approximate

» Lookahead models should


be used only when all else
fails (which is often)
Modeling sequential decision problems
First build your model Then design your policies:
» Objective function » PFA? Exploit obvious
T t  problem structure.
min  E   C  St , X t ( St ) 
 

 t 0  » CFA? Can you tune a


» Policy deterministic approximation
X :S  X to make it work better?
» Constraints at time t
» VFA? Can you approximate
the value of being in a
xt  X t ( St )  Xt downstream state?

» Transition function » Lookahead? Do you have a


forecast? What is the nature
St 1  S M  St , xt ,Wt 1  of the uncertainty?
» Exogenous information
» Hybrid?
(W1 , W2 ,..., WT )
Stochastic programming
Robust optimization
Approximate dynamic programming
Model predictive control
Optimal control
Online learning
Reinforcement learning

Markov decision processes


Stochastic lookahead
Worst-case lookahead
VFA-based policies
Deterministic lookahead
Parametric VFA
Myopic policy
VFA-based policy with discrete actions

Exact lookup table


Computational Stochastic
Optimization

Stochastic programming
Robust optimization
Approximate dynamic programming
Model predictive control
Optimal control
Online learning
Reinforcement learning
Markov decision processes
Thank you!
For a related tutorial, go to:

http://www.castlelab.princeton.edu

and click on the link “Clearing the Jungle


of Stochastic Optimization”

http://www.castlelab.princeton.edu/jungle.htm

You might also like