ORF 544 Week 6 Modeling The Unified Framework

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 193

ORF 544

Stochastic Optimization and Learning


Spring, 2019
.

Warren Powell
Princeton University
http://www.castlelab.princeton.edu

© 2018 W.B. Powell


Week 6

General sequential decision problems


Universal modeling framework

© 2019 Warren Powell Slide 2


Applications

© 2019 Warren Powell Slide 3


Fleet management

Fleet management problem


» Optimize the assignment of drivers to loads over time.
» Tremendous uncertainty in loads being called in

© 2019 Warren Powell


Nomadic trucker illustration
Pre-decision state: we see the demands

$350

$150
$450
$300

 TX  ˆ
St  (  ,Dt )
 t 
© 2019 Warren Powell
Nomadic trucker illustration
We use initial value function approximations…

V 0 ( MN )  0

V 0 ( NY )  0
0
V (CO)  0 $350

0
$150
V (CA)  0
$450
$300

 TX  ˆ
St  (  ,Dt )
 t 
© 2019 Warren Powell
Nomadic trucker illustration
1
… and make our first choice: x

V 0 ( MN )  0

V 0 ( NY )  0
0
V (CO)  0 $350

0
$150
V (CA)  0
$450
$300

 NY 
x
S  (
t )
 t  1
© 2019 Warren Powell
Nomadic trucker illustration
Update the value of being in Texas.

V 0 ( MN )  0

V 0 ( NY )  0
0
V (CO)  0 $350

0
$150
V (CA)  0
$450
$300
V 1 (TX )  450

 NY 
x
S  (
t )
 t  1
© 2019 Warren Powell
Nomadic trucker illustration
Now move to the next state, sample new demands and make a new
decision

V 0 ( MN )  0

$180
0 V 0 ( NY )  0
V (CO)  0
$400

V 0 (CA)  0 $600
$125

V 1 (TX )  450

 NY  ˆ
St 1  (  ,Dt 1 )
 t  1
© 2019 Warren Powell
Nomadic trucker illustration
Update value of being in NY

V 0 ( MN )  0

$180
0 V 0 ( NY )  600
V (CO)  0
$400

V 0 (CA)  0 $600
$125

V 1 (TX )  450

x  CA 
S t 1  ( )
t  2
© 2019 Warren Powell
Nomadic trucker illustration
Move to California.

V 0 ( MN )  0

$400
0 V 0 ( NY )  600
V (CO)  0

$200
V 0 (CA)  0 $150

$350
V 1 (TX )  450

 CA  ˆ
St  2  (  ,Dt  2 )
t  2
© 2019 Warren Powell
Nomadic trucker illustration
Make decision to return to TX and update value of being in CA

V 0 ( MN )  0

$400
0 V 0 ( NY )  500
V (CO)  0

$200
V 0 (CA)  800 $150

$350
V 1 (TX )  450

 CA  ˆ
St  2  (  ,Dt  2 )
t  2
© 2019 Warren Powell
Fleet management
Uber
» Provides real-time, on-demand
transportation.
» Drivers are encouraged to enter or leave
the system using pricing signals and
informational guidance.
Decisions:
» How to price to get the right balance of
drivers relative to customers.
» Assigning and routing drivers to
manage Uber-created congestion.
» Real-time management of drivers.
» Pricing (trips, new services, …)
» Policies (rules for managing drivers,
customers, …)

© 2019 Warren Powell






Unknown outages

Known customers in outage

Outage calls
(known)
Storm 

Network outages
(unknown)

© 2019 Warren Powell


Problem Description - Emergency Storm Response

call
X

X
call

call

X call

X X
call

call
call© 2018 W.B. Powell
Emergency storm response
0.5
call
0.503
Probability that outage 0.503 X
0.503
has occurred here: 0.503 0.503 0.503
X
0.54 0.54 0.76 call
0.064 0.064 0.064

0.064
call 0.603 0.603
0.05 X
0 0.02 0.04 0.502 0.502
0 0 0.03 0.04 0.05 0.502 0.502 0.502
0.67 X0.67
0.05 call

0.26 0.38 0.78 0.6 0.6


X X
call

call call
Emergency storm response
0.5
call
0.503
Probability that outage 0.503 X
0.503
has occurred here: 0.503 0.503 0.503
X
0.54 0.54 0.76 call
0.0604 0.0604 0.0604

0.0604
call 0.62 0.62
0.08 X
0 0.05 0.51 0.51
0.032
0 0 6 0.056
0.048 0.08 0.51 0.51 0
0.99 X 0
0.08 call

0.0 0.0 0.0 0.0 0.0


X X
call

call call
Emergency storm response
0.5
call
0.503
Probability that outage 0.503 X
0.503
has occurred here: 0.503 0.503 0.503
X
0.54 0.54 0.76 call
0.06 0.06 0.06

0.06
call 0 0
0.0 X
0 0.0 0 0
0.0
0 0 0.0 0.0 0.0 0 0 0
0 X 0
0.0 call

0.0 0.0 0.0 0.0 0.0


X X
call

call call
Monte Carlo tree search
AlphaGo

» State variables:
• State of the board
• Beliefs about opponent
strategies

» Uses hybrid of policies:


• MCTS
• PFA
• VFA

© 2017 Warren B. Powell


Planning cash reserves
How much money should we hold in cash given variable
market returns and interest rates to meet the needs of a
business?
Stock prices

Bonds

© 2019 Warren Powell


Energy storage
How much energy to store in a battery to handle the
volatility of wind and spot prices to meet demands?

© 2019 Warren Powell


A note on notation

© 2019 Warren Powell Slide 22


Notational style
The challenge of communicating:
» “The elevator lift is being fixed for the next day.
During that time we regret that you will be
unbearable.” - Sign in a Bucharest business.
» “Please do not feed the animals. If you have any
suitable food, give it to the guard on duty.” - Sign at a
Budapest zoo.
» “You are invited to take advantage of the
chambermaid.” - A Japanese hotel.
» In an Acapulco hotel: “The manager has personally
passed all the water served here.”
» From a tailor shop: “Place orders early because in a big
rush we will execute customers in strict rotation.”
Notational style
Principles of good mathematical notation:

» Notation is a language - if the language is hard to learn,


others will have difficulty speaking it.

» Minimize the number of variables you introduce (i.e.


keep your vocabulary small).

» Make variables as mnemonic as possible.

» Follow consistent, standard conventions.

» Organize your variables into natural groupings:


Notational style
Variables
» A basic variable - lower case and script: x
» Use subscripts to identify elements of a vector:
xtij xti xt
» Use superscripts to create different flavors of a
variable: Decisions : Costs :
purchase : x p cp
sell : xs cs
hold : xh ch

x  x p , xs , xh  
c  c p ,cs ,ch 
Total costs  c p x p  c s x s  c h x h  cx
Notational style
Variables
» Never use variables with more than one letter:
PC  Purchasing cost (is it P times C ?)
BCt  Battery charge at time t

» Using multiple letter variables is popular in the


business community. If you are presenting an equation
in a memo to a management audience, it is perfectly
reasonable to write:
Fleett  Const0  Const1  Tonst  Const2  speedt

This is acceptable in this setting, because the variables


define themselves, and you are never going to
manipulate this equation.
Notational style
Sequencing subscripts:
» When you have multiple subscripts:
• Roughly sequence them in the order that you would place
them in a summation:
xti  Dollars invested in asset i in time period t.

C ( x)   cti xti
t i

xt  ( xti )iI

• Always model discrete time periods as a subscript, because it


is an element of a vector over different time periods.
Notational style
Arguments, superscripts and hats:
» Use arguments only when there is a functional
dependence:

F ( x) or a( x )

» If we are estimating a variable iteratively, again use a


superscript
x n 1  to
F (identify
xn ) different versions of the variable:

Use n for iterations. If you need outer and inner


iterations, use n and m.
Notational style
Choice of letters:
» Use hats and bars (etc) to indicate different
estimates/observations of the same variable:
x , xˆ
» Constants:
• General rule: a, b, c, d and e
» Physical parameters (ratios, speeds, …):
• Use Greek letters:

     

• Avoidunusual
 Greek
(hardletters,
to pronounce/remember)
e.g.
Notational style
Time:
» Always use t or its variants.
• A flight going from city i to city j might start at time t and
arrive a time t’.
• You might purchase an option at time t which can be exercised
at time t’.

» Use  to indicate an interval of time, such as the time


required to complete an action.

» Let t = 0 represent “here and now.”


Notational style
Modeling new information:
» It helps to identify variables that represent new
information arriving in a time period.
» Suggest using “hats” to indicate information that first
becomes known at time t :
Dˆ t  Customer demand for a product in time t.
pˆ t  Market price of a stock at time t.

tbˆ  A baseball player's batting average in year t .

» Use “bars” for statistics


1 t ˆ calculated from exogenous
Dt   Dt '
information: t t '1
pt  (1   ) pt 1   pˆ t
1 N
N
b 
N
 bˆ
n 1
n
Notational style
Sets:
» Sets - Calligraphic (or script) capital letters
x  Xa aA “Monotype Corsiva” in equation editor
Calligraphic in Latex
• Subsets - suggest using:
Xi Ab

» Use sets for summations: n


• Sets
arexi especially than:when
useful
is better there
xi is not a natural indexing:
iI i 1
Let a be a multiattribute vector, where:
aA
It is then easy to write:

x
aA
a

x   xt tT xt   xti iI


Notational style
Notational style
» Functions:
Argument notation (preferred in engineering):
F ( x)
F ( x, y )
Mapping notation (preferred in mathematics):
F :X  
F :X Y  

» Use lower case letters for individual functions; use upper case for
C ( x)   ct ( xt )
sums: tT
Notational style
Notational style
» Matrices:
• Square, capital letters:
A, B
» Multiplication of matrices and vectors:
Use cx when it is understood that we want an inner
product. The "transpose" is implicit. There is no
need to write:
cT x
Modeling “time”

© 2019 Warren Powell Slide 35


Modeling “time”
There are two fundamental ways for modeling the
timing of information and decisions

» Counters – Here we count experiments, iterations,


arrivals of customers, …
• We index counters in superscripts:

» Time – This is always in discrete units:


• Seconds, minutes, hours, days, weeks, months, years
• We index time in subscripts:

» Combinations
• We may want to index the time within the simulation:

© 2019 Warren Powell


37

Modeling “time”
We need a system for indexing time. In particular, it is
important to know the mapping between discrete and
continuous time.

Continuous W1 W2 W3 W4
time
t 1 t2 t 3 t4
                       

t0 t 1 t2 t 3 t4


Discrete
time 𝑆𝑡 𝑥𝑡

It is useful to think of information as arriving continuously over time.


Functions (states, decisions) are measured at a point in time.
At time t, anything is known, anything is unknown.

© 2012 W.B. Powell


Time scales in energy
Frequency regulation seconds
Power quality management
Battery arbitrage minutes

Energy shifting
Demand peak management - Many hours

utilities impose charges based on


peak usage over a month, quarter days-weeks
or even a year.
Peak management for avoiding weeks-months
capacity expansion
Backup power for outages months-years
Modeling “time”
Modeling multiple time scales:

» Strategy 1: If we wish to model hour within day , we


can let time be represented by .

» Strategy 2: Better strategy: Let index the finest


increment (e.g. hour), and then define:
• Let the day in which hour falls.
• Define sets that capture appropriate points in time (e.g. if we
want to make certain decisions at noon each day):
• =Set of time periods that correspond to noon each day. So, we
can say “if then do…”

© 2019 Warren Powell


Modeling sequential decision problems

© 2019 Warren Powell Slide 40


Modeling
We propose to model problems along five
fundamental dimensions:

» State variables
» Decision variables
» Exogenous information
» Transition function
» Objective function

» This framework draws heavily from Markov decision


processes and the control theory communities, but it is
not the standard form used anywhere.
© 2019 Warren Powell
Modeling dynamic problems
The system state:
Controls community
xt  "Information state"
 Operations research/MDP/Computer science
 St   Rt , I t , Bt   System state, where:

 Rt  Resource state (physical state)


Location/status of truck/train/plane
 Energy in storage
 I t  Information state
 Prices

 Weather
Bt  Belief state
 Belief about traffic delays
 Belief about the status of equipment
Bizarrely, only the controls community has a tradition
of actually defining state variables. We return to state
variables later.
© 2019 Warren Powell Slide 42
Modeling dynamic problems
Decisions:
Markov decision processes/Computer science
 at  Discrete action

 Control theory
ut  Low-dimensional continuous vector
 Operations research
 xt  Usually a discrete or continuous but high-dimensional
 vector of decisions.
 At this point, we do not specify how to make a decision.
 Instead, we define the function X  ( s) (or A ( s ) or U  ( s )),
 where  specifies the type of policy. " " carries information
 about the type of function f , and any tunable parameters    f .

© 2019 Warren Powell Slide 43


Problem classes
Types of decisions
» Binary
x  X  0,1
» Finite
x  X  1, 2,..., M 

» Continuous
x  X  a,scalar
b

x  ( x1 ,..., xvector
» Continuous K ), xk  R

x  ( x1 ,..., xK ), xk  Z
» Discrete vector
x  (a1 ,..., aI ), ai is a category (e.g. red/green/blue)
» Categorical
© 2019 Warren Powell
Modeling dynamic problems
Exogenous information:
Wt  New information that first became known at time t
 
= Rˆt , Dˆ t , pˆ t , Eˆ t 
 Rˆt  Equipment failures, delays, new arrivals
 New drivers being hired to the network
 Dˆ t  New customer demands
 pˆ t  Changes in prices
 Eˆ t  Information about the environment (temperature, ...)
 Note: Any variable indexed by t is known at time t. This
 convention, which is not standard in control theory,
dramatically simplifies the modeling of information.
 Below, we let  represent a sequence of actual observations W1 ,W2 ,....
Wt   refers to a sample realization of the random variable Wt .
© 2019 Warren Powell Slide 45
Modeling dynamic problems
The transition function
St 1  S M ( St , xt , Wt 1 )
 R  R  x  Rˆ Inventories
 t 1

pt 1  pt
t


t t 1

pˆ t 1 Spot prices

 Dt 1  Dt  Dˆ t 1 Market demands
 Also known as the:
 “System model” “Transfer function”
 “State transition model”
“Plant model”
“Transformation function”
“Law of motion”
 “Plant equation” “Model”
 “Transition law”
For many applications, these equations are unknown.
This is known as “model-free” dynamic programming.
© 2019 Warren Powell Slide 46
Modeling dynamic problems
The objective function
Dimensions of objective functions
 » Type of performance metric
 » Final cost vs. cumulative cost
 » Expectation or risk measures
 » Mathematical properties (convexity,
 monotonicity, continuity, unimodularity,
 …)
 » Time to compute (fractions of seconds
 to minutes, to hours, to days or months)

© 2019 Warren Powell Slide 47
Elements of a dynamic model
Objective functions
» Cumulative reward (“online learning”)
T 
max  E  Ct  St , X t ( St ), Wt 1  | S0 

 t 0 


max  E F ( x ,W ) | S 
• Policies have to work
 , N well
ˆ over time.
» Final reward (“offline learning”) 0

max  C (S0 , X 0 ( S0 )), C (S1 , X 1 ( S1 )),..., C ( ST , X T ( ST )) | S0 


• We only care about how well the final decision works.
» Risk
© 2019 Warren Powell
Elements of a dynamic model
The complete model:
» Objective function

• Cumulative reward (“online learning”)


T 
max  E  Ct  St , X t ( St ),Wt 1  | S0 
 t 0 


max  E F ( x , N , Wˆ ) | S0 
• Final reward (“offline learning”)
max   C ( S0 , X 0 ( S0 )), C ( S1 , X 1 (S1 )),..., C ( ST , X T ( ST )) | S0 

S
• Risk: t 1  S M
 St , xt ,Wt 1 

 S0 ,W1 ,W2 ,...,WT 


© 2019 Warren Powell
Elements of a dynamic model
The modeling process
» I conduct a conversation with a domain expert to fill in
the elements of a problem:

State Decision New Transition Objective


variables variables information function function

Performance metrics
How the state variables evolve
What we didn’t know
when we made our decision
What we control
What we need to know
(and only what we need)
© 2019 Warren Powell
State variables

© 2019 Warren Powell Slide 51


The state variables
What is a state variable?
» Bellman’s classic text on dynamic programming (1957)
describes the state variable with:
• “… we have a physical system characterized at any stage by a
small set of parameters, the state variables.”
» The most popular book on dynamic programming
(Puterman, 2005, p.18) “defines” a state variable with
the following sentence:
• “At each decision epoch, the system occupies a state.”
» Wikipedia:
• “State commonly refers to either the present condition of a
system or entity” or….
• A state variable is one of the set of variables that are used to
describe the mathematical ‘state’ of a dynamical system
The state variables
What is a state variable?
» Kirk (2004), an introduction to control theory, offers the
definition:
• A state variable is a set of quantities x1 (t ), x2 (t ),... which if
known at time t  t0 are determined for t  t0 by specifying the
inputs for the system for t  t0 .
• … or “all the information you need to model the system from time
t onward.” True, but vague (and only for deterministic problems.
» Cassandras and Lafortune (2008):
• The state of a system at time t0 is the information required at t0
such that the output [cost] y(t) for all t  t0 is uniquely determined
from this information and from u (t ), t  t0 .
• Again, consistent with the statement “all the information you need
to model the system from time t0 onward,” but then why do they
later talk about “Markovian” and “non-Markovian” queueing
systems?
The state variables
From Probability and Stochastics by Erhan Cinlar
(2011):

» Question: Why would you ever model a stochastic


process where you intentionally left needed information
out of the state variable?
The state variables
There appear to be two ways of approaching a
state variable:
» The mathematician’s view – The state variable is a
given, at which point the mathematician will
characterize its properties (“Markovian,” “history-
dependent,” …)

» The modeler’s view – The state variable needs to be


constructed from a raw description of the problem.
Information should not be excluded due to
computational tractability until after a solution strategy
has been designed.
The state variable
Illustrating state variables
» A deterministic graph

17.4 15.9
2 5 8
12.6 5.7
8.1 12.7 9.6
8.4 15.9 8.9 7.3
1 3 6 9 11
9.2 2.3
3.6 13.5 4.5
16.5 20.2
4 7 10

St  ?( N t )  6
The state variable
Illustrating state variables
» A stochastic graph

2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

St  ?
The state variable
Illustrating state variables
» A stochastic graph

2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

 
Stt  ? N t , ct , Nt , j
     
   6, 12.7,8.9,13.5
j

Rt It
The state variable
Illustrating state variables
» A stochastic graph with left turn penalties

2 5 8
12.6
8.1 12.7(.7)
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

 
St  ? N t , ct , Nt , j

 j 
, N t 1   6, 12, 7,8.9,13.5  ,3 
        
Rt It
The state variable
Variant of problem in Puterman (2005):
» Find best path from 1 to 11 that minimizes the second
highest arc cost along the path:
14 4
2 5 8
12 8 1 8
12 12
8 15 14
1 3 6 9 11
15
8
3 11
8 5
7 10
4 7 10

St  ?( N t , highest,second highest)  (9,15,12)


» If the traveler is at node 9, what is her state?
The state variable
Bellman’s equation for our learning problem
» A belief state

𝜎 𝑛5
𝑛 1
𝜇𝑛 𝛽 =
5 2 ,𝑛
5
𝜎5  

» State variable (belief state)

» Bellman equation
© 2019 Warren Powell
The state variable
Belief states
» Lookup table – Frequentist or Bayesian
• Independent beliefs -
• Correlated beliefs

» Parametric with sampled beliefs


• where

© 2019 Warren Powell


The state variable
Illustrating state variables
» A stochastic graph with generalized learning

2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

St  ?
The state variable
Illustrating state variables
» A stochastic graph with generalized learning

2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

 

SStt  ?N t , ct , Nt , j
    
 j
,
      

Rt It Bt
The state variable
Classes of state variables

Bt
It
Rt

Resource/physical state Belief state


Information state
The state variable
Notes:
» The distinction between , , and is imprecise. After all,
they are three types of information.
» There are communities (e.g. operations research) that
have a tendency to interpret “physical state” as “the
state” (the graph example is a good illustration).
» So, we can think of as just a special case of .
» The belief is the parameters of a probability
distribution about parameters we do not know perfectly.
But we can think of (and therefore ) as simply point
estimates of parameters we know perfectly.

© 2019 Warren Powell


The state variable
My definition of a state variable:

» The first depends on a policy. The second depends only on the


problem (and includes the constraints).
» Using either definition, all properly modeled problems are
The state variable
Email received March 15, 2019:

© 2019 Warren Powell


The state variable
Pre- and post-decision states
» The “pre-decision” state variable:
• St  The information required to make a decision xt
• Same as a “decision node” in a decision tree.

» The “post-decision” state variable:


• Stx  The state of what we know immediately after we
make a decision.
• Same as an “outcome node” in a decision tree. Also known as
“end of period state” or “after state”.
» The information and decision sequence
The state variable
Representations of the post-decision state:
» Decision trees: x
St 1
S t
Stx  S M , x  St , xt  St

St 1  S M ,W  Stx , Wt 1 

» Q-learning:
Stx   St , xt  State-action pair

» Transition
Stx  S M  Sfunction
t , xt , Wt ,t 1 
with expectation:
Wt ,t 1  Forecast of Wt 1 at time t.
The state variable
Resource states (single layer)
» =Amount of resource on hand at time .
» Amount of resource of type on hand at time .
» Resources with attribute vector available at time .
Two layer problems:
» Supplies and demands (where demands can be held if
they are not served)
» Uber drivers and riders
» Pilots and aircraft
» (we won’t be touching multi-layer problems in this
course).

© 2019 Warren Powell


The post-decision state
A scalar resource variable:
» Our basic inventory equation (water, money, …)

Rt 1  max 0, Rt  xt  Dˆ t 1 
» where
Rt  Inventory on hand at time t
xt  Amount ordered
Dˆ  Demand in next time period
t 1

» Using pre- and post-decision states:


Rtx  Rt  xt Pre- to post-

Rt 1  max Rtx  Dˆ t 1  Post- to pre-
Modeling resources
Single resource type, dynamic attribute

i 3 

i 2 

i1  i 4 

State space = set of states (or regions) in the United States


Modeling resources
Single commodity flow problems (100-1000 rows)

Time
Locations
Modeling resources
A single commodity: Multiple commodities:
Modeling resources
Multicommodity flow models

Time
s
pe
Ty

Locations
Modeling resources
A single commodity: Multiple commodities:

A multiattribute resource:  a1 
a 
 2
 a3 
a 
 a4 
 
 
 aN 
Modeling resources

 a1 
 a1  a 
a   2
 2  
   
   an 
 an 
 a1 
a 
 a1   2
a   
 2  
   an 
 
 an 
Modeling resources
The evolution of attributes:

a   Time   Time   Time  Time  Time 


 Location   Location   Location   Location   Location 
      
 Equip type   Equip type   Equip type   Equip type 
   Time to dest.  
 Time to dest.   Time to dest. 
 Repair status   Repair status 
 
 Hrs of service 
A  4,000 40,000 1,680,000 5,040,000 50,400,000
Modeling resources
Multiattribute resource allocation problem

 Rta1 
R 
 ta2 
 Rta3 
 
 
 
 
 
R 
 tan 
6 20
Number of rows equals size of attribute space 10  10
The state variable
Other information states
» The “information state” variable captures the state of
any other parameter that we can observe perfectly.
» Examples:
• Weather (e..g temperature, humidity)
• Prices
• Economy
• Traffic conditions
• Forecasts
» The “information state” is simply any information
about perfectly observable parameters that we do not
model as a resource state.

© 2019 Warren Powell


The state variable
Belief states:
» Look up tables, Bayesian belief,
independent beliefs:

» Correlated beliefs

» Sampled beliefs, nonlinear model

© 2019 Warren Powell


The state variable
Multi-layer problems
» Two-layers
• Supplies and demands – Note that this is only a two-layer
problem if both supplies and demands are held if they are not
used or served.
• Uber drivers and customers – if we step forward in 15 minute
increments, and assume that customers not served in the first
time period are lost, then this is a one-layer problem.

© 2019 Warren Powell


Fleet management
Uber
» Provides real-time, on-demand
transportation.
» Drivers are encouraged to enter or leave
the system using pricing signals and
informational guidance.
Decisions:
» How to price to get the right balance of
drivers relative to customers.
» Assigning and routing drivers to
manage Uber-created congestion.
» Real-time management of drivers.
» Pricing (trips, new services, …)
» Policies (rules for managing drivers,
customers, …)

© 2019 Warren Powell


Fleet management

Assigning car in less


dense area allows closer
car to handle potential
demands in more dense
areas.
Closest car…. but it
moves car away from
busy downtown area
and strands other car in
low density area.

© 2019 Warren Powell


Fleet management
Cars Riders

© 2019 Warren Powell


Fleet management

t t+1 t+2

© 2019 Warren Powell


Fleet management

t t+1 t+2

© 2019 Warren Powell


Fleet management

t t+1 t+2

The assignment of cars to riders evolves over time, with new riders
arriving, along with updates of cars available.
© 2019 Warren Powell
The state variable
Multi-layer problems
» Three-layers - Application at Netjets. Assigning

• Pilots

• Aircraft

• Customers

» All three layers involved complex (multiattribute)


resources.

© 2019 Warren Powell


© 2013 W.B.Powell 91
© Netjets
Pilots

Aircraft
Customers
State variables
Notes:
» There is tremendous confusion in the stochastic
optimization literature about state variables.
» One of the worst misconceptions is that if the state
variable is multidimensional, then you have the “curse
of dimensionality.”
» The curse of dimensionality only arises when you use
lookup table representations of functions that depend
on the state variable.
» Most of the examples above have extremely large (or
infinite) state spaces, but we will still be able to find
practical solutions to these problems.

© 2019 Warren Powell


State variables

Lagged information

© 2019 Warren Powell Slide 95


The state variable
Lagged problems
» Resources
• Resources that will be available at time given what you know
at time .
» Decisions
• Decision made at time to do something at time :
– Planning energy generation
– Purchasing a futures contract in month for natural gas to
exercise in month .
» Forecasts
• =Forecast of activity at time given what we know at time .

© 2019 Warren Powell


Future contracts for natural gas
Air Liquide
» Largest industrial gases
company with 64,000
employees.
» Consumes 0.1 percent of
global electricity.

Challenges
» Faces a variety of
challenges to manage risk:
» Spikes in natural gas prices,
electricity prices.
» Pipeline outages due to
storms.
© 2019 Warren Powell
e 98

Future contracts for natural gas


Basics
» Natural gas is a relatively clean, economical source of
energy with approximately half the CO2 footprint of
coal.
» Widely used in the generation of electricity:
• In steam generators
• In combustion turbine generators, which can be turned on and
off relatively quickly in response to unexpected changes in
demands.
» Over the last 10 years, a new manufacturing process
has made it possible to economically tap gas locked in
shale

© 2016 W. B. Powell
e 99

Future contracts for natural gas


Historical natural gas prices

Beginning of major economic downturn


Beginning of the shale gas era

Projected gas prices

© 2016 W. B. Powell
e 100

Future contracts for natural gas


A myopic policy only looks at decisions we make
at time t. It ignores decisions we might make at
times t+1, t+2, …

t t 1 t2 t 3 t' t ' 1


June July August September zzzzz Oct 2014 Nov 2014
2011 2011 2011 2011

© 2016 W. B. Powell
e 101

Future contracts for natural gas


We can also buy hedges for different points in
time in the future.

t t 1 t2 t 3 t' t ' 1


June July August September zzzzz Oct 2014 Nov 2014
2011 2011 2011 2011

© 2016 W. B. Powell
e 102

Future contracts for natural gas


A myopic policy only looks at decisions we make at time t.
A lookahead policy considers decisions we may make at
times t+1, t+2, …, t+H
xt  2,t  2 , xt  2,t 3 ,...
xt 1,t 1 , xt 1,t  2 ,...
xt ,t , xt ,t 1 ,...

t t 1 t2 t 3 t' t ' 1


June July August September zzzzz Oct 2014 Nov 2014
2011 2011 2011 2011

© 2016 W. B. Powell
e 103

The state variable


State variable for lagged problems:

xt  2,t  2 , xt  2,t 3 ,...


xt 1,t 1 , xt 1,t  2 ,...
xt ,t , xt ,t 1 ,...

t t 1 t2 t 3 t' t ' 1


zzzzz

Contracts signed prior to time that become


due at time .
State
An energy generation portfolio

© 2019 Warren Powell Slide 104


Stochastic unit commitment
Day-ahead planning (slow – predominantly steam)

Intermediate-term planning (fast – gas turbines)

Real-time planning (economic dispatch)


Stochastic unit commitment
The day-ahead unit commitment problem
Midnight Midnight Midnight Midnight
Noon Noon Noon Noon
Stochastic unit commitment
Intermediate-term unit commitment problem
1:00 pm 2:00 pm 3:00 pm

1:15 pm 1:45 pm 2:15 pm 2:30 pm


1:30
Stochastic unit commitment
Intermediate-term unit commitment problem
1:00 pm 2:00 pm 3:00 pm

1:15 pm 1:45 pm 2:15 pm 2:30 pm


1:30
Stochastic unit commitment
Intermediate-term unit commitment problem

Commitment
2:00 pm 3:00 pm

1:15 pm     
decisions

1:30

     2 Turbine 2
Ramping, but
no on/off
decisions. Turbine 3

Turbine 1
Notification time
Stochastic unit commitment
Real-time economic dispatch problem

1pm 2pm
1:05 1:10 1:15 1:20 1:25 1:30
Stochastic unit commitment
Real-time economic dispatch problem

1pm 2pm
1:05 1:10 1:15 1:20 1:25 1:30

Run economic dispatch to perform


5 minute ramping

Run AC power flow model


Stochastic unit commitment
Real-time economic dispatch problem

1pm 2pm
1:05 1:10 1:15 1:20 1:25 1:30

Run economic dispatch to perform


5 minute ramping

Run AC power flow model


Stochastic unit commitment
Real-time economic dispatch problem

1pm 2pm
1:05 1:10 1:15 1:20 1:25 1:30

Run economic dispatch to perform


5 minute ramping

Run AC power flow model


© 2018 W.B. Powell
Forecasting wind power
Rolling 24-hour forecast of PJM wind farms

Actual

𝑡 𝑓 𝑤𝑖𝑛𝑑
𝑡𝑡 ′ 𝑡′
Decision variables

© 2019 Warren Powell Slide 119


Decision variables
When modeling decision variables:

» Model the decision variables

» Model the constraints

» Introduce the policy , and say that it will be designed


later.

© 2019 Warren Powell


21

Decision variables
There are three common notational systems for
decisions:

» Computer science
Discrete action

» Control theory
Low dimensional continuous vector

» Operations research
Discrete or continuous, scalar or (typically) vector
valued decisions.
Decision variables
Styles of decisions
» Binary
x  X  0,1
» Finite
x  X  1, 2,..., M 

» Continuous
x  X  a,scalar
b

x  ( x1 ,..., xvector
» Continuous K ), xk  R

x  ( x1 ,..., xK ), xk  Z
» Discrete vector
x  (a1 ,..., aI ), ai is a category (e.g. red/green/blue)
» Categorical
Decision variables
Decisions for multiattribute resources
» Let
is a type of decision (buy, sell, move to a location,
treat with a drug, …)

Number of times we act on a resource with attribute


using decision

» Constraints:

© 2019 Warren Powell


24

Decision variables
How do we make decisions?
» We use policies, which are rules for making decisions.
We will use notation such as:

• The policy for determining action .


• The policy for determining action .
• The policy for determining action .

Here, is a label that determines the type of function.


Notes
» Model first, then solve. Do not attempt to design the
policy while you are modeling the problem.
» It is extremely important that making a decision at time
can only use information in the state at time .
Exogenous information

© 2019 Warren Powell Slide 125


Exogenous information
Notation:
» is information that first becomes available from outside the
system by time (or between and ).
» We are going to need to model different sources of exogenous
information such as prices, temperature, equipment failures, and
markets. We need to indicate the variables whose values come
from outside our system. My notation is to put hats on exogenous
information.
» We can represent exogenous information as the most recent value
of a random variable, such as the energy from wind:
energy from wind as of time .
» … or we can write the exogenous information as the change in the
state variable:

© 2019 Warren Powell


27

Exogenous information
We need a system for indexing time. In particular, it is
important to know the mapping between discrete and
continuous time.

Continuous D̂1 D̂2 D̂3 D̂4


time
t 1 t2 t 3 t4
                       

Discrete t0 t 1 t2 t 3 t4


time R0 , y0 R1 , y1 R2 , y2 R3 , y3 R4 , y4

It is useful to think of information as arriving continuously over time.


Functions (states, decisions) are measured at a point in time.
At time t, anything is known, anything t’ > t is unknown.
28

Exogenous information
Often, we need to model what will happen in the future to make a decision now.
Let:
Dt  The customer demand in time period (t  1, t ).
pt  The market price at time t.
The future looks like:
 D , p  ,  D , p  ,...,  D , p 
1 1 2 2 t t

We say that  Dt , pt  is the information arriving at time t. It is sometimes useful to


have a single variable to represent the new information arriving to the system at
time t. There is not standard notation for modeling information. We let:

Wt  The information arriving in time (t  1, t ).

Wt represents a realization of the information that arrives in time period t.


Exogenous information
Modeling sample paths:
» We are often simulating our process, which means we
need to be able to represent a particular realization of
our random information.
» We let
A sample realization of .
where is the set of all realizations.
» In theory, can be some infinite set of all possible
outcomes (esp. if is continuous), but in our work, will
always be a sample:

© 2019 Warren Powell


30 Powell
W.B.

Exogenous information
For our example, we would write:
Wt   Dt , pt 
In the future, we do not know what might happen. Assume that the only type of new information
arriving is customer demands. That is, Wt   Dt .

Assume that there are only 10 possible sets of demands that might happen in the future:

w D1 D2 D3 D4 D5 D6 D7 D8 D9
1 18 16 13 10 17 6 4 15 16
2 12 7 17 15 5 3 4 14 8
3 6 18 7 9 1 13 4 4 7
4 2 11 16 16 1 2 13 0 13
5 18 5 0 6 10 17 8 3 2
6 3 18 5 20 13 16 18 11 10
7 12 14 4 11 19 3 20 19 18
8 6 15 15 14 2 7 14 1 11
9 19 10 5 19 13 14 16 11 17
10 18 15 14 4 6 17 16 10 9

It is absolutely standard notation to index these outcomes by  (don't ask why). The set
of outcomes (the sample space) is referred to as  So an element    refers to a
particular set of potential outcomes.
31

Exogenous information
We would say that Dt is a random variable because we do not know the
demand Dt right now. We might, for example, assume that Dt follows
some probability distribution so that we can describe the range of
possible outcomes.

Sometimes we need to refer to a particular realization. For this, we let:

Dt ( )  A sample realization of the demand at time t.


w
1 0 16 9 12 1 15 6 5 6
Dt ( ) is not a random variable. Let's say  =6. Then,
2 9 16 2 9 10 15 19 2 17
3 19 6 10 8 13 11 15 11 9
4 4 6 13 18 13 10 12 9 14
5 D131 D82 D163 D204 2 13 7 13 5
Dt (6)  6 3 18 5 20 13 16 18 11 10
7 19 3 1 9 2 16 2 10 12
8 4 17 13 0 4 9 9 14 19
9 4 12 18 11 8 5 14 4 11
10 12 18 15 17 6 12 18 5 19
Exogenous information
Notes:
» You need to be careful when using the “omega”
notation.
» The index refers to an entire sequence of information
events
» If you make a decision , this means the decision you
make at time while following sample path , but this is
like telling the decision both the history and the future
(since this is all contained in the sample path
» We have to take additional steps to make sure that a
decision made while following sample path is not
allowed to peek into the future.

© 2019 Warren Powell


Exogenous information
From sample paths to histories:
» A node in the scenario tree is equivalent to a history
1 
t t' 5 

htt ' 
            
3  H (h )
2  tt ' tt '


4 
Exogenous information
The complete exogenous information process consists of

Notes:
» The initial state is where we input:
• Deterministic parameters
– The loss from converting energy from AC to DC and back
– The maximum speed of a vehicle
• Initial values of dynamic parameters
– Initial inventory or price
• Initial beliefs about unknown parameters
– Prior belief about demand as a function of price
– Prior belief about how a patient responds to a drug
» The dynamic information process
• is any new information that becomes available after making decision
• may depend on and/or .
Transition function

© 2019 Warren Powell Slide 135


36

The transition function


The transition function captures the evolution over
time:
St 1  S M  St , xt ,Wt 1 

» The transition function goes by many names:


• System model
• Plant model
• Plant equation
• Law of motion
• State equation
• Transition law
• Transfer function
• “Model”

© 2016 W.B. Powell


37

The transition function


The transition function captures the evolution over time:
St 1  S M  St , xt , Wt 1 

» At time t:
St is known (deterministic)
xt is a deterministic function of St
Wt 1 is random

The controls
xt 1  f community
xt , ut , wt  (where this was invented) writes
this as:

The transition function
Physical resources:
» An inventory problem

» General resource allocation problems


• Let

© 2019 Warren Powell


The transition function
Information processes

» Exogenous information: Price process:

» Energy from wind:

© 2019 Warren Powell


The transition function
Updating belief models
» Lookup table frequentist:

» Lookup table Bayesian


• Independent beliefs

• Correlated beliefs:

© 2019 Warren Powell


41

The transition function


Two frameworks:
» Model-based
• This is where we have a set of equations that describe the
transition.
• In computer science, it refers to problems where the one-step
transition matrix is known.

» Model-free
• Here, we do not know the transition function.
• Typical for complex problems (describing human behavior, the
economy, climate, a complex physical problem)
• In this case, we simply observe the next state without knowing
how we got there.

© 2016 W.B. Powell


Objective functions

© 2019 Warren Powell Slide 142


Objective functions
Performance metrics:
» Rewards, profits, revenues, costs (business)
» Gains, losses (engineering)
» Strength, conductivity, diffusivity (materials science)
» Tolerance, toxicity, effectiveness (health)
» Speed, stability, reliability (engineering)
» Risk, volatility (finance)
» Utility (economics)
» Errors (machine learning)
» Time (to complete a task)

© 2019 Warren Powell


Objective functions
Styles of writing the performance metric:
» State-independent problems

» State-dependent problems:

© 2019 Warren Powell


Objective functions
Offline (final reward)
» Asymptotic formulation:
üï
ïï
max x E F (x ,W ) ïï “ranking and selection”
or
ý
ïï “stochastic search”
» Finite horizon formulation:
 ,N ïï
max  E F (x ,W ) ïþ

Online (cumulative
N 1 reward)
»max  (X  (as
E toFlearn
Wehave
n 0
S nwe
),Wgon 1 )
Objective functions
Evaluating a policy:
» State independent, final reward

• Outer is only used when contains a prior distribution of belief


over some unknown parameter.
» State independent, cumulative reward (bandit problems):

» State dependent, cumulative reward:

© 2019 Warren Powell


Problem classes

© 2019 Warren Powell Slide 147


Problem classes
Sequencing of decisions and information
» Decision, information
• Classical stochastic search:
• Finite time stoch. opt – final reward:
» Information, decision, information
• State dependent: .
• Contextual bandit: . is the “context” that may change each time I
have to solve the problem
» Decision, information, decision
• Two-stage stochastic programming

where
» Decision, information, decision, information, …
• Sequential dynamic programming, optimal control, …cumulative
reward:

© 2019 Warren Powell


Problem classes
The circle of stochastic optimization

max x E F ( x, W )

N 1
max x  c0 xo  E Q( x0 ,W1 ) 
max  learn E  C ( St , X  imp
t ( St ))
n 0
max  E F ( X  , N , W )

 N 1 
max  E  C ( St , X t ( St )) | S0 
 N 1

 n 0  max  E  F ( X  ( S n ), W n 1 )
n0

 N 1 
max  E  F ( X ( S ), W ) | S0 
 n n 1

 n 0 
Problem classes
State independent problems
» The problem does not depend on the state of the system.


max x E F (x ,W )  E p min(x ,W )  cx 
» The only state variable is what we know (or believe)
E F (x ,W )
about the unknown function , called the belief
state , so .
State dependent problems
» Now the problem may depend on what we know at time t:
 
max 0x R E C (S , x ,W )  E pt min(x ,W )  cx
t

» Now the state is


Problem classes
State-dependent problems
» Why does it matter if a problem depends on dynamic information
or not?
» For a state-independent problem, we are looking for a best
alternative that we call to represent the solution we found using
policy (algorithm) after experiments.
» For a state-dependent problem, assume we can write our state as
where is information that the problem depends on, while is our
belief about the problem.
» Now, instead of finding , we want to find , which means instead of
finding a number (or vector of numbers), we are looking to find a
function .
» This is … harder.

© 2019 Warren Powell


Problem classes
Objective functions for:
» State independent and state dependent problems.
» Final reward and cumulative reward
Problem classes
Problem class (1):
» This is our basic stochastic search problem with a finite
learning budget.
» The “policy” may be a stochastic gradient algorithm,
or a learning policy for derivative-free problems, where
the policy would include the choice of stepsize rule.
» Either way, we can view this as looking for the best
algorithm to solve this problem.

© 2019 Warren Powell


Problem classes
Problem class (2):
» This is the same as (1), but now we are optimizing the
cumulative reward where
» The derivative-free version is the original multiarmed
bandit problem, but the policy could also be a
stochastic gradient algorithm with a particular stepsize.
» Both (1) and (2) are familiar to us.

© 2019 Warren Powell


Problem classes
Problem class (3):
» This is the same as (2), but now we are optimizing the state-
dependent version.
» This is the form typically used to write down a classical dynamic
program (which we will see later).
» In the state independent version, we want policies that will strike a
balance between learning and doing. In state dependent problems,
the state variable typically does not include a belief state (so no
explicit learning), but clearly there has to be a learning component.
» If there was a belief state, this would be a form of sequential,
contextual bandit problem. As of this writing, there is virtually no
literature addressing this problem, even though this is how
dynamic programs are traditionally written.

© 2019 Warren Powell


Problem classes
Problem class (4):
» The way to think of this is to compare to (1), and recognize that
the policy in (1) is a policy to learn which is the decision that we
implement.
» So, we view this as a problem to find the best learning policy to
find the best policy that will be implemented.
» The hard part is – how do we take the expectation over the state .
The distribution of will depend on the implementation policy.
» We can approximate the distribution of by simulating the
implementation policy over time periods:

» We will see later that the “learning policy” describes any adaptive
learning algorithm for solving dynamic programs. As with (1), we
want a “good” algorithm, but finding an optimal learning
algorithm is aspirational.
© 2019 Warren Powell
Problem classes
Simulating objective functions
» It is important to know how to simulate objective
functions.
Problem classes
Objective functions

Learning policies:
We will focus on these in the second half
Approximate dynamic programming
Q-learning of the course.
SDDP

Problem classes
Objective functions

“Online” (cumulative reward) dynamic programming is recognized as the


“dynamic programming problem,” but the entire literature on solving
dynamic programs describes class (4) problems. Class (3) appears to be
an open problem class.
From machine learning to stochastic
optimization

© 2019 Warren Powell Slide 160


Stochastic opt. and machine learning
Optimization in machine learning

» Fitting them model to data


• Offline-batch
• Online

» The decision of what information to collect


• Active learning – origins in the machine learning community
• Multiarmed bandit problems – origins in the probability
community

© 2019 Warren Powell


Canonical problems
Statistical model fitting
» Given dataset:
x n  ( x1n , x2n ,..., xKn )
 Independent variables, explanatory variables, covariates, ...
y n  Dependent variable

» General optimization
N problem:
min f F ,  f   y  f ( x |  ) 
n n 2

n 1

» Popular strategy
|  )    f  f ( xn )
f ( x nmodels:
• Linear
f F
Stochastic opt. and machine learning
Statistics/machine learning Stochastic optimization
» Data » Data
xt or xn or x n State St or S n

» “Decision” (parameters) » Decision

function f  F and    f xt or x n
» “Predicting”
» Predicting: Decision xt or x n
yn
Value of state V ( St )
» Objective
» Objective1 N
max  E F ( X  , N ,W )
or:
  yn  f ( xn |  ) 
2
MSE  min
N  
max  E   C  St , X t ( St ) 
n 1

 t 
Stochastic opt. and machine learning
Statistics/machine learning Stochastic optimization
» Classical batch learning » Sample average approximation
1 N
1 N

  yn  f ( xn |  )  
2 n
(1) min max x F ( x ,  )
N n 1 N n 1
» Online (recursive) statistics » Stochastic search

min E Y  f ( x |  ) 
2
(2) max x E F ( x, W )
» Policy search
» Searching over functions  
min f E Y  f ( x |  )  max  E   C  St , X t ( St ) 
2
(3)
f F ,   t 
» Matching posterior decisions

 2
(4) min E  yˆ
f» Matching
F ,  f
  f (x |  )
datat 1over time
t
2

f F , 
 t
ˆ  
max f E  X t 1 ( )  X ( St |  ) 
 


t
An energy storage example

Energy arbitrage

© 2019 Warren Powell Slide 165


An energy storage problem
A model of our problem

» State variables

» Decision variables

» Exogenous information

» Transition function

» Objective function

© 2018 W.B. Powell


Policy function approximations
Battery arbitrage – When to charge, when to
discharge, given volatile LMPs
Energy arbitrage
State variables

» Energy stored in the battery


» Current grid price
Energy arbitrage
Decision variables

<
» How much to sell or buy ( to/from the grid.
» Constraints:

» Policy:
Energy arbitrage
Exogenous information

» Change in the electricity price between () and .


Energy arbitrage
Transition function
Energy arbitrage
Objective function
Energy arbitrage
Buy low, sell high policy
140.00

120.00

100.00

80.00

60.00
Discharge
 40.00

 Charge 20.00

0.00
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70

We have to search for the best values for the policy


parameters  Ch arg e and  Disch arg e .
Energy arbitrage
Our policy function might be the parametric
model (this is nonlinear in the parameters):
1 if pt   charge
 
X ( St |  )   0 if  charge  pt   discharge
1 if p   charge
 t
Energy arbitrage
Finding the best policy
» We need to maximize
T
max F ( )  E   t C  St , X t ( St |  ) 
t 0

» We cannot compute the expectation, so we run simulations:

Discharge
 Charge

Energy arbitrage

Click on sheet for spreadsheet


© 2019 Warren Powell
An energy storage example

A storage system

© 2019 Warren Powell Slide 177


An energy storage problem
State variables
L
E B

» We will present the full model, accumulating the


information we need in the state variable.
» We will highlight information we need as we proceed.
This information will make up our state variable.

© 2018 W.B. Powell


An energy storage problem
Decision variables
L
E B

xt   xtEL , xtEB , xtGL , xtGB , xtBL , 

» Constraints;

© 2018 W.B. Powell


An energy storage problem
Exogenous information
L
E B

 Eˆ  Change in energy from wind between t  1 and t


 t
  tp  Noise in the price process between t  1 and t
 D
Wt   f tt '  Forecast of demand Dt ' provided by vendor at time t
 D
 tf  
 f D
tt ' t '  t Provided exogenously
 D
  t  Difference between actual demand and forecast
© 2018 W.B. Powell
An energy storage problem
Transition function
L
E B

Et 1  Et  Eˆt 1
pt 1   0 pt  1 pt 1   2 pt  2   tp1
D  f D  Dˆ
t 1 t , t 1 t 1

f t D1  Provided exogenously


Rtbattery
1  Rt
battery
 xt
© 2018 W.B. Powell
An energy storage problem
Objective function
L
E B

C ( St , xt )  pt  xtGB  xtGL 

T 
min  E  Ct ( St , X t ( St )) | S 0 

 t 0 

© 2018 W.B. Powell


An energy storage problem
State variables
» Cost function St   Rt , Et , Lt , ( pt , pt 1 , pt  2 ), ft L 
pt  Price of electricity
» Decision function
Constraints:

f t L  Needed if we use a lookahead policy

p
pt 1   0 pfunction
» Transition t   p
1 t 1   p
2 t 2   t 1

© 2018 W.B. Powell


An energy storage example

With passive learning

© 2019 Warren Powell Slide 184


An energy storage problem
Types of learning:
» No learning ( are known)

pt 1  0 pt  1 pt 1   2 pt  2   tp1

» Passive learning (learn from price data)


pt 1  t 0 pt  t1 pt 1  t 2 pt  2   tp1

» Active learning (“bandit problems”)


GB p
This  example
ptis1 an p
t0 t  
of p 
learning
t 1 t 1  a p  
transition
t 2 t 2 x  
function.
t3 t t 1
An energy storage problem
Transition function
L
E B

Et 1  Et  Eˆ t 1  pt 
 
pt 1  t 0 pt   t1 pt 1   t 2 pt  2   tp1  ( t )T pt   tp1 pt   pt 1 
Dt 1  f t ,Dt 1   tD1 p 
 t 2 
Rtbattery
1  Rt
battery
 xt
Learning in stochastic optimization
Updating the demand parameter
» Let be the new price and let
Ft price ( pt | t )  (t )T pt  t 0 pt  t 1 pt 1  t 2 pt  2

» We update our estimate using our recursive least


squares equations:1
qt + 1 = qt - Bt pt et + 1
gt+ 1
et + 1 = Ft price ( pt | qt ) - pt + 1 ,
1
Bt + 1 = Bt -
gt+ 1
( Bt pt ( pt )T Bt )
g t + 1 = 1 + ( pt )T Bt pt
An energy storage problem
State variables 
D

S   E , L , R , ( p , p , p ) , f t , ( t , Bt )
» Cost function t
 t
 t
 t
 t
 
t 1
 
t 2
   
 
pt  Price of electricity
» Decision function
Constraints:

p
p  
» Transition
t 1 p  
t 0 tfunction p
t 1 t 1   p
t 2 t 2   t 1

Dt 1  f t ,Dt 1   tD1
1
t 1   t  Bt  ( xt ) t 1
 t 1
An energy storage example

With active learning

© 2019 Warren Powell Slide 189


An energy storage problem
Types of learning:
» No learning ( are known)

pt 1  0 pt  1 pt 1   2 pt  2   tp1

» Passive learning (learn from price data)


pt 1  t 0 pt  t1 pt 1  t 2 pt  2   tp1

» Active learning (“bandit problems”) Buy/sell decisions


pt 1  t 0 pt  t1 pt 1  t 2 pt  2  t 3 xtGB   tp1

Our decisions influence the prices we observe, which helps


with learning.
An energy storage problem
Notes:
» When we introduce active learning, the state variable is
the same as with passive learning.
» This becomes a very rich form of continuous bandit
problem.
» What is likely to change is the policy. Recall from
earlier when we were doing pricing while learning a
demand function that we designed policies that
encouraged broader exploration.
» We are going to accomplish this by insisting that is a
function of the state rather than the sample path .

© 2019 Warren Powell


What is next?

© 2019 Warren Powell Slide 192


What is next?
Now that we have our basic modeling framework,
we have two major challenges:

» Modeling uncertainty

» Designing policies

© 2019 Warren Powell

You might also like