Professional Documents
Culture Documents
ORF 544 Week 6 Modeling The Unified Framework
ORF 544 Week 6 Modeling The Unified Framework
ORF 544 Week 6 Modeling The Unified Framework
Warren Powell
Princeton University
http://www.castlelab.princeton.edu
$350
$150
$450
$300
TX ˆ
St ( ,Dt )
t
© 2019 Warren Powell
Nomadic trucker illustration
We use initial value function approximations…
V 0 ( MN ) 0
V 0 ( NY ) 0
0
V (CO) 0 $350
0
$150
V (CA) 0
$450
$300
TX ˆ
St ( ,Dt )
t
© 2019 Warren Powell
Nomadic trucker illustration
1
… and make our first choice: x
V 0 ( MN ) 0
V 0 ( NY ) 0
0
V (CO) 0 $350
0
$150
V (CA) 0
$450
$300
NY
x
S (
t )
t 1
© 2019 Warren Powell
Nomadic trucker illustration
Update the value of being in Texas.
V 0 ( MN ) 0
V 0 ( NY ) 0
0
V (CO) 0 $350
0
$150
V (CA) 0
$450
$300
V 1 (TX ) 450
NY
x
S (
t )
t 1
© 2019 Warren Powell
Nomadic trucker illustration
Now move to the next state, sample new demands and make a new
decision
V 0 ( MN ) 0
$180
0 V 0 ( NY ) 0
V (CO) 0
$400
V 0 (CA) 0 $600
$125
V 1 (TX ) 450
NY ˆ
St 1 ( ,Dt 1 )
t 1
© 2019 Warren Powell
Nomadic trucker illustration
Update value of being in NY
V 0 ( MN ) 0
$180
0 V 0 ( NY ) 600
V (CO) 0
$400
V 0 (CA) 0 $600
$125
V 1 (TX ) 450
x CA
S t 1 ( )
t 2
© 2019 Warren Powell
Nomadic trucker illustration
Move to California.
V 0 ( MN ) 0
$400
0 V 0 ( NY ) 600
V (CO) 0
$200
V 0 (CA) 0 $150
$350
V 1 (TX ) 450
CA ˆ
St 2 ( ,Dt 2 )
t 2
© 2019 Warren Powell
Nomadic trucker illustration
Make decision to return to TX and update value of being in CA
V 0 ( MN ) 0
$400
0 V 0 ( NY ) 500
V (CO) 0
$200
V 0 (CA) 800 $150
$350
V 1 (TX ) 450
CA ˆ
St 2 ( ,Dt 2 )
t 2
© 2019 Warren Powell
Fleet management
Uber
» Provides real-time, on-demand
transportation.
» Drivers are encouraged to enter or leave
the system using pricing signals and
informational guidance.
Decisions:
» How to price to get the right balance of
drivers relative to customers.
» Assigning and routing drivers to
manage Uber-created congestion.
» Real-time management of drivers.
» Pricing (trips, new services, …)
» Policies (rules for managing drivers,
customers, …)
Outage calls
(known)
Storm
Network outages
(unknown)
call
X
X
call
call
X call
X X
call
call
call© 2018 W.B. Powell
Emergency storm response
0.5
call
0.503
Probability that outage 0.503 X
0.503
has occurred here: 0.503 0.503 0.503
X
0.54 0.54 0.76 call
0.064 0.064 0.064
0.064
call 0.603 0.603
0.05 X
0 0.02 0.04 0.502 0.502
0 0 0.03 0.04 0.05 0.502 0.502 0.502
0.67 X0.67
0.05 call
call call
Emergency storm response
0.5
call
0.503
Probability that outage 0.503 X
0.503
has occurred here: 0.503 0.503 0.503
X
0.54 0.54 0.76 call
0.0604 0.0604 0.0604
0.0604
call 0.62 0.62
0.08 X
0 0.05 0.51 0.51
0.032
0 0 6 0.056
0.048 0.08 0.51 0.51 0
0.99 X 0
0.08 call
call call
Emergency storm response
0.5
call
0.503
Probability that outage 0.503 X
0.503
has occurred here: 0.503 0.503 0.503
X
0.54 0.54 0.76 call
0.06 0.06 0.06
0.06
call 0 0
0.0 X
0 0.0 0 0
0.0
0 0 0.0 0.0 0.0 0 0 0
0 X 0
0.0 call
call call
Monte Carlo tree search
AlphaGo
» State variables:
• State of the board
• Beliefs about opponent
strategies
Bonds
C ( x) cti xti
t i
xt ( xti )iI
F ( x) or a( x )
• Avoidunusual
Greek
(hardletters,
to pronounce/remember)
e.g.
Notational style
Time:
» Always use t or its variants.
• A flight going from city i to city j might start at time t and
arrive a time t’.
• You might purchase an option at time t which can be exercised
at time t’.
x
aA
a
» Use lower case letters for individual functions; use upper case for
C ( x) ct ( xt )
sums: tT
Notational style
Notational style
» Matrices:
• Square, capital letters:
A, B
» Multiplication of matrices and vectors:
Use cx when it is understood that we want an inner
product. The "transpose" is implicit. There is no
need to write:
cT x
Modeling “time”
» Combinations
• We may want to index the time within the simulation:
Modeling “time”
We need a system for indexing time. In particular, it is
important to know the mapping between discrete and
continuous time.
Continuous W1 W2 W3 W4
time
t 1 t2 t 3 t4
Energy shifting
Demand peak management - Many hours
» State variables
» Decision variables
» Exogenous information
» Transition function
» Objective function
Weather
Bt Belief state
Belief about traffic delays
Belief about the status of equipment
Bizarrely, only the controls community has a tradition
of actually defining state variables. We return to state
variables later.
© 2019 Warren Powell Slide 42
Modeling dynamic problems
Decisions:
Markov decision processes/Computer science
at Discrete action
Control theory
ut Low-dimensional continuous vector
Operations research
xt Usually a discrete or continuous but high-dimensional
vector of decisions.
At this point, we do not specify how to make a decision.
Instead, we define the function X ( s) (or A ( s ) or U ( s )),
where specifies the type of policy. " " carries information
about the type of function f , and any tunable parameters f .
» Continuous
x X a,scalar
b
x ( x1 ,..., xvector
» Continuous K ), xk R
x ( x1 ,..., xK ), xk Z
» Discrete vector
x (a1 ,..., aI ), ai is a category (e.g. red/green/blue)
» Categorical
© 2019 Warren Powell
Modeling dynamic problems
Exogenous information:
Wt New information that first became known at time t
= Rˆt , Dˆ t , pˆ t , Eˆ t
Rˆt Equipment failures, delays, new arrivals
New drivers being hired to the network
Dˆ t New customer demands
pˆ t Changes in prices
Eˆ t Information about the environment (temperature, ...)
Note: Any variable indexed by t is known at time t. This
convention, which is not standard in control theory,
dramatically simplifies the modeling of information.
Below, we let represent a sequence of actual observations W1 ,W2 ,....
Wt refers to a sample realization of the random variable Wt .
© 2019 Warren Powell Slide 45
Modeling dynamic problems
The transition function
St 1 S M ( St , xt , Wt 1 )
R R x Rˆ Inventories
t 1
pt 1 pt
t
t t 1
pˆ t 1 Spot prices
Dt 1 Dt Dˆ t 1 Market demands
Also known as the:
“System model” “Transfer function”
“State transition model”
“Plant model”
“Transformation function”
“Law of motion”
“Plant equation” “Model”
“Transition law”
For many applications, these equations are unknown.
This is known as “model-free” dynamic programming.
© 2019 Warren Powell Slide 46
Modeling dynamic problems
The objective function
Dimensions of objective functions
» Type of performance metric
» Final cost vs. cumulative cost
» Expectation or risk measures
» Mathematical properties (convexity,
monotonicity, continuity, unimodularity,
…)
» Time to compute (fractions of seconds
to minutes, to hours, to days or months)
© 2019 Warren Powell Slide 47
Elements of a dynamic model
Objective functions
» Cumulative reward (“online learning”)
T
max E Ct St , X t ( St ), Wt 1 | S0
t 0
max E F ( x ,W ) | S
• Policies have to work
, N well
ˆ over time.
» Final reward (“offline learning”) 0
max E F ( x , N , Wˆ ) | S0
• Final reward (“offline learning”)
max C ( S0 , X 0 ( S0 )), C ( S1 , X 1 (S1 )),..., C ( ST , X T ( ST )) | S0
S
• Risk: t 1 S M
St , xt ,Wt 1
Performance metrics
How the state variables evolve
What we didn’t know
when we made our decision
What we control
What we need to know
(and only what we need)
© 2019 Warren Powell
State variables
17.4 15.9
2 5 8
12.6 5.7
8.1 12.7 9.6
8.4 15.9 8.9 7.3
1 3 6 9 11
9.2 2.3
3.6 13.5 4.5
16.5 20.2
4 7 10
St ?( N t ) 6
The state variable
Illustrating state variables
» A stochastic graph
2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
St ?
The state variable
Illustrating state variables
» A stochastic graph
2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
Stt ? N t , ct , Nt , j
6, 12.7,8.9,13.5
j
Rt It
The state variable
Illustrating state variables
» A stochastic graph with left turn penalties
2 5 8
12.6
8.1 12.7(.7)
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
St ? N t , ct , Nt , j
j
, N t 1 6, 12, 7,8.9,13.5 ,3
Rt It
The state variable
Variant of problem in Puterman (2005):
» Find best path from 1 to 11 that minimizes the second
highest arc cost along the path:
14 4
2 5 8
12 8 1 8
12 12
8 15 14
1 3 6 9 11
15
8
3 11
8 5
7 10
4 7 10
𝜎 𝑛5
𝑛 1
𝜇𝑛 𝛽 =
5 2 ,𝑛
5
𝜎5
» Bellman equation
© 2019 Warren Powell
The state variable
Belief states
» Lookup table – Frequentist or Bayesian
• Independent beliefs -
• Correlated beliefs
2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
St ?
The state variable
Illustrating state variables
» A stochastic graph with generalized learning
2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
SStt ?N t , ct , Nt , j
j
,
Rt It Bt
The state variable
Classes of state variables
Bt
It
Rt
St 1 S M ,W Stx , Wt 1
» Q-learning:
Stx St , xt State-action pair
» Transition
Stx S M Sfunction
t , xt , Wt ,t 1
with expectation:
Wt ,t 1 Forecast of Wt 1 at time t.
The state variable
Resource states (single layer)
» =Amount of resource on hand at time .
» Amount of resource of type on hand at time .
» Resources with attribute vector available at time .
Two layer problems:
» Supplies and demands (where demands can be held if
they are not served)
» Uber drivers and riders
» Pilots and aircraft
» (we won’t be touching multi-layer problems in this
course).
i 3
i 2
i1 i 4
Time
Locations
Modeling resources
A single commodity: Multiple commodities:
Modeling resources
Multicommodity flow models
Time
s
pe
Ty
Locations
Modeling resources
A single commodity: Multiple commodities:
A multiattribute resource: a1
a
2
a3
a
a4
aN
Modeling resources
a1
a1 a
a 2
2
an
an
a1
a
a1 2
a
2
an
an
Modeling resources
The evolution of attributes:
Rta1
R
ta2
Rta3
R
tan
6 20
Number of rows equals size of attribute space 10 10
The state variable
Other information states
» The “information state” variable captures the state of
any other parameter that we can observe perfectly.
» Examples:
• Weather (e..g temperature, humidity)
• Prices
• Economy
• Traffic conditions
• Forecasts
» The “information state” is simply any information
about perfectly observable parameters that we do not
model as a resource state.
» Correlated beliefs
t t+1 t+2
t t+1 t+2
t t+1 t+2
The assignment of cars to riders evolves over time, with new riders
arriving, along with updates of cars available.
© 2019 Warren Powell
The state variable
Multi-layer problems
» Three-layers - Application at Netjets. Assigning
• Pilots
• Aircraft
• Customers
Aircraft
Customers
State variables
Notes:
» There is tremendous confusion in the stochastic
optimization literature about state variables.
» One of the worst misconceptions is that if the state
variable is multidimensional, then you have the “curse
of dimensionality.”
» The curse of dimensionality only arises when you use
lookup table representations of functions that depend
on the state variable.
» Most of the examples above have extremely large (or
infinite) state spaces, but we will still be able to find
practical solutions to these problems.
Lagged information
Challenges
» Faces a variety of
challenges to manage risk:
» Spikes in natural gas prices,
electricity prices.
» Pipeline outages due to
storms.
© 2019 Warren Powell
e 98
© 2016 W. B. Powell
e 99
© 2016 W. B. Powell
e 100
© 2016 W. B. Powell
e 101
© 2016 W. B. Powell
e 102
© 2016 W. B. Powell
e 103
Commitment
2:00 pm 3:00 pm
1:15 pm
decisions
1:30
2 Turbine 2
Ramping, but
no on/off
decisions. Turbine 3
Turbine 1
Notification time
Stochastic unit commitment
Real-time economic dispatch problem
1pm 2pm
1:05 1:10 1:15 1:20 1:25 1:30
Stochastic unit commitment
Real-time economic dispatch problem
1pm 2pm
1:05 1:10 1:15 1:20 1:25 1:30
1pm 2pm
1:05 1:10 1:15 1:20 1:25 1:30
1pm 2pm
1:05 1:10 1:15 1:20 1:25 1:30
Actual
𝑡 𝑓 𝑤𝑖𝑛𝑑
𝑡𝑡 ′ 𝑡′
Decision variables
Decision variables
There are three common notational systems for
decisions:
» Computer science
Discrete action
» Control theory
Low dimensional continuous vector
» Operations research
Discrete or continuous, scalar or (typically) vector
valued decisions.
Decision variables
Styles of decisions
» Binary
x X 0,1
» Finite
x X 1, 2,..., M
» Continuous
x X a,scalar
b
x ( x1 ,..., xvector
» Continuous K ), xk R
x ( x1 ,..., xK ), xk Z
» Discrete vector
x (a1 ,..., aI ), ai is a category (e.g. red/green/blue)
» Categorical
Decision variables
Decisions for multiattribute resources
» Let
is a type of decision (buy, sell, move to a location,
treat with a drug, …)
» Constraints:
Decision variables
How do we make decisions?
» We use policies, which are rules for making decisions.
We will use notation such as:
Exogenous information
We need a system for indexing time. In particular, it is
important to know the mapping between discrete and
continuous time.
Exogenous information
Often, we need to model what will happen in the future to make a decision now.
Let:
Dt The customer demand in time period (t 1, t ).
pt The market price at time t.
The future looks like:
D , p , D , p ,..., D , p
1 1 2 2 t t
Exogenous information
For our example, we would write:
Wt Dt , pt
In the future, we do not know what might happen. Assume that the only type of new information
arriving is customer demands. That is, Wt Dt .
Assume that there are only 10 possible sets of demands that might happen in the future:
w D1 D2 D3 D4 D5 D6 D7 D8 D9
1 18 16 13 10 17 6 4 15 16
2 12 7 17 15 5 3 4 14 8
3 6 18 7 9 1 13 4 4 7
4 2 11 16 16 1 2 13 0 13
5 18 5 0 6 10 17 8 3 2
6 3 18 5 20 13 16 18 11 10
7 12 14 4 11 19 3 20 19 18
8 6 15 15 14 2 7 14 1 11
9 19 10 5 19 13 14 16 11 17
10 18 15 14 4 6 17 16 10 9
It is absolutely standard notation to index these outcomes by (don't ask why). The set
of outcomes (the sample space) is referred to as So an element refers to a
particular set of potential outcomes.
31
Exogenous information
We would say that Dt is a random variable because we do not know the
demand Dt right now. We might, for example, assume that Dt follows
some probability distribution so that we can describe the range of
possible outcomes.
Notes:
» The initial state is where we input:
• Deterministic parameters
– The loss from converting energy from AC to DC and back
– The maximum speed of a vehicle
• Initial values of dynamic parameters
– Initial inventory or price
• Initial beliefs about unknown parameters
– Prior belief about demand as a function of price
– Prior belief about how a patient responds to a drug
» The dynamic information process
• is any new information that becomes available after making decision
• may depend on and/or .
Transition function
» At time t:
St is known (deterministic)
xt is a deterministic function of St
Wt 1 is random
The controls
xt 1 f community
xt , ut , wt (where this was invented) writes
this as:
The transition function
Physical resources:
» An inventory problem
• Correlated beliefs:
» Model-free
• Here, we do not know the transition function.
• Typical for complex problems (describing human behavior, the
economy, climate, a complex physical problem)
• In this case, we simply observe the next state without knowing
how we got there.
» State-dependent problems:
Online (cumulative
N 1 reward)
»max (X (as
E toFlearn
Wehave
n 0
S nwe
),Wgon 1 )
Objective functions
Evaluating a policy:
» State independent, final reward
where
» Decision, information, decision, information, …
• Sequential dynamic programming, optimal control, …cumulative
reward:
max x E F ( x, W )
N 1
max x c0 xo E Q( x0 ,W1 )
max learn E C ( St , X imp
t ( St ))
n 0
max E F ( X , N , W )
N 1
max E C ( St , X t ( St )) | S0
N 1
n 0 max E F ( X ( S n ), W n 1 )
n0
N 1
max E F ( X ( S ), W ) | S0
n n 1
n 0
Problem classes
State independent problems
» The problem does not depend on the state of the system.
max x E F (x ,W ) E p min(x ,W ) cx
» The only state variable is what we know (or believe)
E F (x ,W )
about the unknown function , called the belief
state , so .
State dependent problems
» Now the problem may depend on what we know at time t:
max 0x R E C (S , x ,W ) E pt min(x ,W ) cx
t
» We will see later that the “learning policy” describes any adaptive
learning algorithm for solving dynamic programs. As with (1), we
want a “good” algorithm, but finding an optimal learning
algorithm is aspirational.
© 2019 Warren Powell
Problem classes
Simulating objective functions
» It is important to know how to simulate objective
functions.
Problem classes
Objective functions
Learning policies:
We will focus on these in the second half
Approximate dynamic programming
Q-learning of the course.
SDDP
…
Problem classes
Objective functions
» General optimization
N problem:
min f F , f y f ( x | )
n n 2
n 1
» Popular strategy
| ) f f ( xn )
f ( x nmodels:
• Linear
f F
Stochastic opt. and machine learning
Statistics/machine learning Stochastic optimization
» Data » Data
xt or xn or x n State St or S n
function f F and f xt or x n
» “Predicting”
» Predicting: Decision xt or x n
yn
Value of state V ( St )
» Objective
» Objective1 N
max E F ( X , N ,W )
or:
yn f ( xn | )
2
MSE min
N
max E C St , X t ( St )
n 1
t
Stochastic opt. and machine learning
Statistics/machine learning Stochastic optimization
» Classical batch learning » Sample average approximation
1 N
1 N
yn f ( xn | )
2 n
(1) min max x F ( x , )
N n 1 N n 1
» Online (recursive) statistics » Stochastic search
min E Y f ( x | )
2
(2) max x E F ( x, W )
» Policy search
» Searching over functions
min f E Y f ( x | ) max E C St , X t ( St )
2
(3)
f F , t
» Matching posterior decisions
2
(4) min E yˆ
f» Matching
F , f
f (x | )
datat 1over time
t
2
f F ,
t
ˆ
max f E X t 1 ( ) X ( St | )
t
An energy storage example
Energy arbitrage
» State variables
» Decision variables
» Exogenous information
» Transition function
» Objective function
<
» How much to sell or buy ( to/from the grid.
» Constraints:
» Policy:
Energy arbitrage
Exogenous information
120.00
100.00
80.00
60.00
Discharge
40.00
Charge 20.00
0.00
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70
Discharge
Charge
Energy arbitrage
A storage system
» Constraints;
Et 1 Et Eˆt 1
pt 1 0 pt 1 pt 1 2 pt 2 tp1
D f D Dˆ
t 1 t , t 1 t 1
C ( St , xt ) pt xtGB xtGL
T
min E Ct ( St , X t ( St )) | S 0
t 0
p
pt 1 0 pfunction
» Transition t p
1 t 1 p
2 t 2 t 1
pt 1 0 pt 1 pt 1 2 pt 2 tp1
Et 1 Et Eˆ t 1 pt
pt 1 t 0 pt t1 pt 1 t 2 pt 2 tp1 ( t )T pt tp1 pt pt 1
Dt 1 f t ,Dt 1 tD1 p
t 2
Rtbattery
1 Rt
battery
xt
Learning in stochastic optimization
Updating the demand parameter
» Let be the new price and let
Ft price ( pt | t ) (t )T pt t 0 pt t 1 pt 1 t 2 pt 2
p
p
» Transition
t 1 p
t 0 tfunction p
t 1 t 1 p
t 2 t 2 t 1
Dt 1 f t ,Dt 1 tD1
1
t 1 t Bt ( xt ) t 1
t 1
An energy storage example
pt 1 0 pt 1 pt 1 2 pt 2 tp1
» Modeling uncertainty
» Designing policies