Dynamic Programming

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 52

ESI 4313

Operations Research 2

Dynamic Programming
Dynamic Programming
Dynamic programming is a technique for
solving certain types of optimization
problems
The idea is to break up a large, complex
problem into many smaller, much easier ones
Usually, this technique can be applied to
problems in which a sequence of decisions
over time needs to be made to optimize
some criterion
Dynamic Programming
In many cases, solving a problem by
dynamic programming means
formulating this problem as a shortest path
problem in an acyclic network
The art of dynamic programming lies in
how to construct this network!
Example:
Travel from coast to coast
You currently live in NYC (1), but plan to move
to LA (10)
You will drive
To save money, you will spend each night of your
trip at a friends house
You structure your potential stopovers as follows:
In 1 day you can reach Columbus (2), Nashville (3), or
Louisville (4)
On the 2
nd
day, you can reach Kansas City (5), Omaha (6),
or Dallas (7)
On the 3
rd
day, you can reach San Antonio (8) or Denver (9)
On the 4
th
day, you can reach LA
To minimize your gas expenses, you are looking for
the route of minimum length
Example:
Travel from coast to coast
We can classify the cities as follows:
Call all cities that you can be in at the
beginning of your n
th
day the Stage n cities
The idea of solving this problem by
dynamic programming is to start by
solving easy problems that will eventually
help you solve the entire problem
In particular, we will work backward
Example:
Travel from coast to coast
Denote the distance between city i and
city j by c
i,j
If city i is a stage t city, we denote the
length of the shortest path from city i to
LA by f
t
(i)
Clearly, we would like to find f
0
(1)
Example:
Travel from coast to coast
First, find the shortest path to LA from
each of the cities from which you can
reach LA in 1 day the stage 4 cities
Note that these problems are trivial, since in
each case theres only 1 way to go to LA
More formally,
f
4
(8) = c
8,10
f
4
(9) = c
9,10
Example:
Travel from coast to coast
Then, find the shortest path to LA from
each of the stage 3 cities
Note that this means that you should first go
to a stage 4 city, and then use the shortest
path from this stage 4 city to LA
These problems are not as trivial as the first
ones, but by simply looking at all possible
city 4 problems and the solutions to the first
set of problems this remains relatively easy
Example:
Travel from coast to coast
From each stage 3 city
go to a stage 4 city, and then use the
shortest path from this stage 4 city to LA
So, for example, f
3
(5) is equal to
c
5,8
+ f
4
(8), or
c
5,9
+ f
4
(9)
Since were interested in the shortest
path, we have
f
3
(5) = min{c
5,8
+ f
4
(8) , c
5,9
+ f
4
(9)}
Example:
Travel from coast to coast
Perform the same procedure for the stage
2 cities
Perform the same procedure for the stage
1 city, NYC
From NYC you should first go to a stage 2
city, and then use the shortest path from this
stage 2 city to LA
We can find the best route from NYC to LA
by considering all possible stage 2 cities
Example:
Travel from coast to coast
In general, in stage t we are interested in
finding f
t
(i) for all stage t cities i
Using the earlier approach, we can write
f
t
(i) = min
j: j is a stage t+1 city
{c
i,j
+ f
t+1
(j) }
for all stage t cities i

Computational efficiency of
dynamic programming
In the example, we could simply enumerate all
possible paths from NYC to LA
It is easy to see that there are 3x3x2=18 paths
However, suppose that we have more options:
Starting city is again stage 1
5 cities in each of 5 stages (stages 2,,6)
Destination city is stage 7
Then there are 5
5
=3,125 paths
Determining the length of each of these paths takes
a total of 5x5
5
= 15,625 additions and 3,124
comparisons
Computational efficiency of
dynamic programming
How much work is the dynamic
programming algorithm?
The stage 6 problems are trivial
Each of the other problems require
5 additions (potential choices for next city to
visit) and 4 comparisons
For a total of 4x5x5 + 5 = 105 additions and
4x5x4 + 4 = 84 comparisons
Characteristics of dynamic
programming
The problem should have stages
Each stage corresponds to a point at which a
decision needs to be made
Each stage should have a number of
associated states
The state contains all information that is
needed to make an optimal decision for the
remaining problem
Characteristics of dynamic
programming
The decision chosen at each stage
describes how the state at the current
stage is transformed in the state at the
next stage
The optimal decision at the current state
should not depend on previously visited
states or previous decisions
This is called the principle of optimality
Characteristics of dynamic
programming
There must be a recursion that relates
the cost or reward for stages t, t+1, , T
to the cost or reward for stages t+1, t+2,
, T
This recursion formalizes the procedure of
working backwards from the last stage to the
first stage
Dynamic programming
formulation
Stages: t =1,,5
States: city
Decision in each stage:
Choose the stage t+1 city to go to
Dynamic programming recursion:
f
4
(i) = c
i,10
for all stage 4 cities i
f
t
(i) = min
j: j is a stage t+1 city
{c
i,j
+ f
t+1
(j) }
for all stage t cities i
Dynamic programming
without stages
You must drive from Bloomington to
Cleveland
You are interested in the route that takes
the least amount of time
Dynamic programming
without stages
Gary
Indianapolis
Bloomington
Toledo
Dayton
Cincinnati
Cleveland
Columbus
1 hour
2 hours
3 hours
3 hours
3 hours
2.5 hours
2 hours
1 hour
2 hours
3 hours
3 hours
Production & inventory
planning
Consider the following production & inventory
planning problem for a single item:
Consider a planning period of T periods, and assume
that
the demand for the item in each of the periods is known
the initial inventory level is known
At the start of each period, you must decide how
many units to produce; production capacity is limited
Each periods demand must be met on time
There is a limited amount of storage space available
The goal is to minimize the total production &
inventory costs over the planning horizon
Production & inventory
planning
This is a periodic review model
Denote the demand in period t by d
t

(t =1,,T )
Denote the cost of producing x units in
period t by c
t
(x) (often, this function is
independent of t, i.e., c
t
(x)=c(x) )
If at the end of period t the inventory
level is I, a cost of h
t
(I) is charged (often,
these costs are independent of t, i.e.,
h
t
(I)=h(I) )
Production & inventory
planning
If the production and inventory holding
cost functions are linear, we can
formulate this problem as an LP problem
(how?)
Often, the production costs are assumed
to have a fixed-charge structure:
c(x) = 0 if x = 0, c(x) = a + bx if x > 0
In that case, we can formulate this problem
as a mixed-integer LP problem (how?)
Production & inventory
planning
More generally, we can formulate this
problem as an NLP problem (how?)
Production (and inventory) costs are
often assumed to be concave reflecting
economies of scale
What does that mean for the ease of
solvability of the NLP problem?
Production & inventory
planning
NLP formulation:
1 1
1
min ( ) ( )
subject to
1,...,
0 1,...,
0 1,...,
T T
t t t t
t t
t t t t
t
t
c x h I
I x d I t T
I B t T
x C t T
= =

+
+ = + =
s s =
s s =

Production & inventory
planning
Dynamic programming provides a solution
methodology that can be applied for
general cost functions
We only need to assume that the units of
demand, production, and inventory are
integers which is not unrealistic in many
practical situations
This methodology will be efficient if the
magnitude of the numbers involved is not too
large
Production & inventory
planning
We must identify:
Stages
time:
States
(starting) inventory level:
Decisions
production quantity:
Recursion
minimal cost from start of stage t :
Clearly, we are looking for f
1
(I
0
)
1,..., t T =
0,..., I B =
0,..., x C =
( )
t
f I
Production & inventory
planning
Recursion:
Cost at the beginning of stage T :


Note that you will always want to end up
with 0 inventory so the final periods
production will be
So:
(what if d
T
-I < 0 or d
T
-I > C ???)
( )
0
( ) min ( ) ( )
T T T T
x C
f I c x h I x d
s s
= + +
T
x d I =
( ) ( )
T T T
f I c d I =
Production & inventory
planning
Recursion:
Cost at the beginning of stage t :


We have to make sure that we have
sufficient storage capacity, i.e., we need

We have to make sure that we deliver on
time
( )
t
f I =
t
I x d B + s or
t
x d B I s +
(
max(0, ) min( , )
min ( ) ( )
t t
t t t
d I x C d B I
c x h I x d
s s +
+ +
0
t
I x d + > or
t
x d I >
)
1
( )
t t
f I x d
+
+ +
Production & inventory
planning example
Example:
T = 4 periods
Demands: 1, 4, 2, 3
Inventory holding costs: $0.50 per unit
Production costs:
fixed setup cost $3
variable cost $1 per unit
Production capacity C = 5 units
Inventory capacity B = 4 units
Production & inventory
planning
Initialization: T = 4
d
4
= 3
Cost at the beginning of stage T = 4 :


I = 0: f
4
(0) = c
4
(3-0) = 3 + 31 = 6
I = 1: f
4
(1) = c
4
(3-1) = 3 + 21 = 5
I = 2: f
4
(2) = c
4
(3-2) = 3 + 11 = 4
I = 3: f
4
(3) = c
4
(3-3) = 0
I = 4: f
4
(4) = c
4
(3-4) = ???
( ) ( )
T T T
f I c d I =
0
Production & inventory
planning
Next stage: t = 3
d
3
= 2
Cost at the beginning of stage t = 3:
( )
max(0, ) min( , )
1
( ) min
( ) ( ) ( )
t t
t
d I x C d B I
t t t t t
f I
c x h I x d f I x d
s s +
+
=
+ + + +
( )
3
max(0,2 ) min(5,2 4 )
3 3 4
( ) min
( ) ( 2) ( 2)
I x I
f I
c x h I x f I x
s s +
=
+ + + +
Production & inventory
planning
I = 0:
( )
3
max(0,2 0) min(5,2 4 0)
3 3 4
(0) min
( ) (0 2) (0 2)
x
f
c x h x f x
s s +
=
+ + + +
( )
1
3 3 4 2
2 5
(0) min ( ) ( 2) ( 2)
x
f c x x f x
s s
= + +
Production & inventory
planning
I = 0:
3
1
4 2
1 1 1
4 2 2 2
1
4 2
1 1 1
4 2 2 2
(0)
2: 2 1 2 (0) 5 6 11
3: 2 1 3 (1) 5 5 10
min
4: 2 1 4 (2) 6 4 10
5: 2 1 5 (3) 6 0 6
f
x f
x f
x f
x f
=
= + + = + =

= + + = + =

= + + = + =

= + + = + =

Production & inventory


planning
I = 2:
( )
3
max(0,2 2) min(5,2 4 2)
3 3 4
(2) min
( ) (2 2) (2 2)
x
f
c x h x f x
s s +
=
+ + + +
( )
1
3 3 4 2
0 4
(2) min ( ) ( )
x
f c x x f x
s s
= + +
( )
{ }
1
3 4 4 2
1 4
(2) min (0),min 3 1 ( )
x
f f x f x
s s
= + +
Production & inventory
planning
I = 2:
3
4
1 1 1
4 2 2 2
1 1 1
4 2 2 2
1 1 1
4 2 2 2
1 1 1
4 2 2 2
(2)
0: (0) 6
1: 3 1 1 (1) 4 5 9
2: 3 1 2 (2) 4 4 8 min
3: 3 1 3 (3) 4 0 4
4: 3 1 4 (4) 4 0 4
f
x f
x f
x f
x f
x f
=
= =

= + + = + =

= + + = + =

= + + = + =

= + + = + =

Production & inventory


planning
Network representation:
Nodes: stage/state combinations (t,I)
Arcs: decisions x
Arc from node (t,I) corresponding to decision
x leads to node (t+1,I+x-d
t
)
Cost of this arc is c
t
(x) + h
t
(I+x-d
t
)
Resource allocation:
the knapsack problem (1)
Stockco is considering 4 investments
Investment 1 will yield a NPV of $16K, but
requires a cash outflow of $5K
Investment 2 will yield a NPV of $22K, but
requires a cash outflow of $7K
Investment 3 will yield a NPV of $12K, but
requires a cash outflow of $4K
Investment 4 will yield a NPV of $8K, but
requires a cash outflow of $3K
You have a budget of $14K
Resource allocation:
the knapsack problem (1)
IP formulation:
4
1 2 3 4
1
4
1 2 3 4
1
max 16 22 12 8
subject to
5 7 4 3 14
{0,1} 1,..., 4
i i
i
i i
i
i
NPV x x x x x
C x x x x x
x i
=
=
= + + +
= + + + s
e =

Resource allocation:
the knapsack problem (2)
You are planning an overnight hike, and are
considering taking 4 items along on your trip
Item 1 yields a benefit of 16, but weighs 5 lbs
Item 2 yields a benefit of 22, but weighs 7 lbs
Item 3 yields a benefit of 12, but weighs 4 lbs
Item 4 yields a benefit of 8, but weighs 3 lbs
You do not want to carry more than 14 lbs
You want to maximize your benefit
Mathematically, this is the same problem as the
investment problem!
Resource allocation:
more general
Stockco is considering n investments
Investment n will yield a NPV of r
n
(d
n
) when
d
n
$1,000 is invested
You only want to (or can) invest in integer multiples of
$1,000
You have a budget of B $1,000
Example
n = 3, B = 6
r
1
(d
1
) = 7d
1
+2 (d
1
>0), r
1
(0) = 0
r
2
(d
2
) = 3d
2
+7 (d
2
>0), r
2
(0) = 0
r
3
(d
3
) = 4d
3
+5 (d
3
>0), r
3
(0) = 0
Resource allocation:
more general
NLP formulation:
1
1
max ( )
subject to

{0,1,...} 1,...,
n
i i
i
n
i
i
i
r d
d B
d i n
=
=
s
e =

Resource allocation
To formulate this problem as a DP problem, we
must identify:
Stages
investment categories:
States
budget available:
Decisions
investment amount:
Recursion
maximal return from inv. categories i,,3 :
Clearly, we are looking for f
1
(6)
1,2,3 i =
0,1,...,6 y =
0,1,...,6 d =
( )
i
f y
Resource allocation
Recursion:
Return from investment in category 3 only:

Note that you will always invest all remaining
budget in category 3 at this stage, i.e., d=y
3 3
0
( ) max ( )
d y
f y r d
s s
=
3 3
0 if 0
( ) ( )
4 5 if 1,...,6
y
f y r y
y y
=

= =

+ =

Resource allocation
Recursion:
Return from investment in categories 2 and 3:

These subproblems are a little harder
y=0: f
2
(0) = 0
y=1: f
2
(1) =max(r
2
(0)+f
3
(1),r
2
(1)+f
3
(0))
=max(0+9,10+0) = 10
y=2: f
2
(2) =
max(r
2
(0)+f
3
(2),r
2
(1)+f
3
(1),r
2
(2)+f
3
(0))
=max(0+13,10+9,13+0) = 19
( )
2 2 3
0
( ) max ( ) ( )
d y
f y r d f y d
s s
= +
Resource allocation
Network representation:
Nodes: stage/state combinations (i,y)
Arcs: decisions d
Arc from node (i,y) corresponding to decision
x leads to node (i+1,y-d)
Return of this arc is r
i
(d)
Resource allocation:
even more general
NLP formulation:






Find the DP formulation for this general
case
1
1
max ( )
subject to
( )
{0,1,..., } 1,...,
n
i i
i
n
i i
i
i i
r d
g d B
d U i n
=
=
s
e =

Equipment replacement
problem
A company faces the problem of how long a
machine should be utilized before it should be
traded in for a new one
Example
A new machine costs p=$1,000, and has a useful
lifetime of 3 years
Maintaining a machine during its first 3 years costs
m
1
=$60, m
2
=$80, m
3
=$120, respectively
If a machine is traded in, a salvage value is
obtained: s
1
=$800, s
2
=$600, and s
3
=$500,
respectively, after the first 3 years
Equipment replacement
problem
We currently have a y year old machine
Find a policy that minimizes total net costs
over the next 5 years
Equipment replacement
problem
To formulate this problem as a DP problem, we
must identify:
Stages
time:
States
age of machine:
Decisions
keep or trade-in:
Recursion
minimal net cost after period t :
Clearly, we are looking for f
0
(y)
0,1,...,5 t =
0,1,2,3 y =
0,1 d =
( )
t
f y
Equipment replacement
problem
Recursion:
Note that you will always salvage the
machine at the end of year 5:
Net cost after period 5:

5
( ) 1,2,3
y
f y s y = =
Equipment replacement
problem
Recursion:
At the end of period t < 5, you must decide
whether to keep or trade-in the machine
If y=3, you must trade it in


If y<3, you have a real choice:
3 1
(3) (1) 0,1,..., 4
t t
f s p f t
+
= + + =
1
1
0: ( 1)
( )
1: (1)
1,2; 0,1,..., 4
y t
t
y t
x m f y
f y
x s p f
y t
+
+
= + +

=

= + +

= =
Equipment replacement
problem
Network representation:
Nodes: stage/state combinations (t,y)
Arcs: decisions x
Arc from node (t,y) corresponding to decision
x=0 leads to node (t+1,y+1)
x=1 leads to node (t+1,1)
Return of the arc is
m
y
when x=0
s
y
+ p when x=1

You might also like