Part4 2 Stochastic Full

IEDA 3010
Prescriptive Analytics
Dynamic Programming 2
IEDA 3010
Dynamic Programming
Dr. Jin QI
Department of Industrial Engineering and Decision Analytics
Hong Kong University of Science and Technology
IEDA 3010
Introduction
• The state at the next stage is not completely
determined by the state and policy decision at the
current stage.
– What the next state will be follows a probability distribution
– This probability distribution is completely determined by the
state and policy decision at the current stage
• The recursive relationship involves the expected profit
or cost from the future stages
2
IEDA 3010
CHAPTER 10 DYNAMIC PROGRAMMING Dynamic Programming 2
Stage n Stage n ! 1
Probability Contribution 1
from stage n
C1 f*n!1(1)
p1
C2 2
Decision p2
State: sn xn !
n!1(2)
f*
fn(sn, xn) pS !
! !
!
CS !
1
re for S
mic
n!1(S)
f*
Basic structure for stochastic DP models
and policy decision at the current stage. The resulting basic structure for probabilistic dy-
namic programming is described diagrammatically in Fig. 10.11. 3
IEDA 3010
Example 1: Winning in Macau

• A UST student believes he has developed a system for winning
Blackjack, a popular game in Macau casinos.
• He believes his system will give him a probability of ⅔ of
winning a given play of the game.
• But his classmates do not believe him so they have made a
large bet with him that if he starts with three chips, he will not
have at least five chips after three plays of the game.
• Each play of the game involves betting any desired number of
available chips and then either winning or losing this number of
chips.
• Goal: determine a betting policy that maximizes his probability
of winning the bet with his classmates.
4
IEDA 3010
DP formulation
• DP formulation
– Stage n: nth play of the game, n = 1,2,3
– Decision variable xn: number of chips to bet at stage n
– State sn: number of chips in hand at beginning of stage n
– Profit function fn(sn, xn): probability of finishing three plays
with at least five chips, given that he starts stage n in state
sn, makes immediate decision xn, and makes optimal
decisions thereafter
5
IEDA 3010
DP formulation
• DP recursion
– Given sn and n, let x⇤n denote any value of xn that maximizes
fn(sn, xn), and let fn⇤ (sn ) be the corresponding maximum value
of fn(sn, xn) ⇤ ⇤
fn (sn ) = max fn (sn , xn ) = fn (sn , xn )
xn =0,1,...,sn
– fn⇤ (sn ) is the maximum probability of finishing three plays

with at least five chips, given that he starts stage n in state
sn
6
IEDA 3010
CHAPTER 10 DYNAMIC PROGRAMMING Dynamic Programming 2
Stage n Stage n % 1
Probability Contribution
from stage n
sn $ xn
0
1 n%1(sn $ xn)
f*
Decision 3
State: sn xn
2
Value: fn(sn, xn) 3
0
1 2
& f*n%1(sn $ xn) % f*n%1(sn % xn) sn % xn
3 3
for the
f*n%1(sn % xn)
– So the recursive relationship

⇢ is ⇢
1 ⇤ 21 ⇤ 2 ⇤
fn⇤ (sn ) = max fn⇤ (sn ) f=n+1 (snmaxxn ) + fn+1 (sn + xn ) + fn+1 (sn + xn
xn =0,1,2,...,sn 3 xn =0,1,2,...,sn 23 2
Solution Procedure. This recursive relationship leads to the following computational
results.with terminal value ⇢
1, if s4 5
f4⇤ (s4 ) =
n ! 3: s3 f 3*(s3) x3* 0, if s4 < 5
!0 0 — 7
!1 0 —
IEDA 3010
Solution Procedure
• n=3
– Suppose the current state is 3. The possible decisions
are to bet xn = 0, 1, 2, or 3 chips.
1 ⇤ 2 ⇤ 1 2
x3 = 0 : f3 (3, 0) = f4 (3) + f4 (3) = ⇥0+ ⇥0=0
3 3 3 3
1 ⇤ 2 ⇤ 1 2
x3 = 1 : f3 (3, 1) = f4 (2) + f (4) = ⇥0+ ⇥0=0
3 3 4 3 3
1 2 ⇤ 1 2 2
x3 = 2 : f3 (3, 2) = f4⇤ (1) + f4 (5) = ⇥0+ ⇥1=
3 3 3 3 3
1 2 ⇤ 1 2 2
x3 = 3 : f3 (3, 3) = f4⇤ (0) + f4 (6) = ⇥0+ ⇥1=
3 3 3 3 3
– So the optimal decision given the current state 3 at
stage 3 is x⇤3 = 2 or x⇤3 = 3 with f3⇤ (3) = 32
8
as problem. f*n%1(sn
IEDA 3010
Solution Procedure
Solution Procedure. This recursive relationship leads to the following com
results.
n ! 3: s3 f 3*(s3) x3*
!0 0 —
!1 0 —
!2 0 —
2
!3 "" 2 (or more)
3
2
!4 "" 1 (or more)
3
!5 1 0 (or # s3 $ 5)
1 2
f2(s2, x2) ! ""f 3*(s2 " x2) # ""f 3*(s2 # x2)
3 3
x2
n ! 2: s2 0 1 2 3 4 f 2*(s2)
!0 0 0
9
!1 0 0 0
IEDA 3010
Solution Procedure
• n=2
are to bet xn = 0, 1, 2, 3, or 4 chips.
1 ⇤ 2 ⇤ 1 2 2 2 2
x2 = 0 : f2 (4, 0) = f (4) + f (4) = ⇥ + ⇥ =
3 3 3 3 3 3 3 3 3
1 ⇤ 2 ⇤ 1 2 2 8
x2 = 1 : f2 (4, 1) = f (3) + f (5) = ⇥ + ⇥1=
3 3 3 3 3 3 3 9
1 ⇤ 2 ⇤ 1 2 2
x2 = 2 : f2 (4, 2) = f3 (2) + f3 (6) = ⇥0+ ⇥1=
3 3 3 3 3
1 ⇤ 2 ⇤ 1 2 2
x2 = 3 : f2 (4, 3) = f (1) + f (7) = ⇥0+ ⇥1=
3 3 3 3 3 3 3
1 ⇤ 2 ⇤ 1 2 2
x2 = 4 : f2 (4, 4) = f (0) + f (8) = ⇥0+ ⇥1=
3 3 3 3 3 3 3
stage 2 is x⇤2 = 1 with f2⇤ (4) = 98
10
!0 0 —
!1 0 — IEDA 3010
!2 0 — Prescriptive Analytics
2
!3 "" 2 (or more)
3
!4
2
""
3 Solution Procedure
1 (or more)
!5 1 0 (or # s3 $ 5)
1 2
f2(s2, x2) ! ""f 3*(s2 " x2) # ""f 3*(s2 # x2)
3 3
x2
n ! 2: s2 0 1 2 3 4 f 2*(s2) x2*
!0 0 0 —
!1 0 0 0 —
4 4 4
!2 0 "" "" "" 1 or 2
9 9 9
2 4 2 2 2
!3 "" "" "" "" "" 0, 2, or 3
3 9 3 3 3
2 8 2 2 2 8
!4 "" "" "" "" "" "" 1
3 9 3 3 3 9
!5 1 1 0 (or # s2 $ 5)
1 2 11
f (s , x ) ! ""f *(s " x ) # ""f *(s # x )
IEDA 3010
Solution Procedure
• n=1
– The current state is 3. The possible decisions are to bet
x1 = 0, 1, 2, or 3 chips
1 ⇤ 2 ⇤ 1 2 2 2 2
x1 = 0 : f1 (3, 0) = f2 (3) + f2 (3) = ⇥ + ⇥ =
3 3 3 3 3 3 3
1 2 ⇤ 1 4 2 8 20
x1 = 1 : f1 (3, 1) = f2⇤ (2) + f2 (4) = ⇥ + ⇥ =
3 3 3 9 3 9 27
1 2 ⇤ 1 2 2
x1 = 2 : f1 (3, 2) = f2⇤ (1) + f2 (5) = ⇥0+ ⇥1=
3 3 3 3 3
1 ⇤ 2 ⇤ 1 2 2
x1 = 3 : f1 (3, 3) = f2 (0) + f (6) = ⇥0+ ⇥1=
3 3 2 3 3 3

⇤
stage 1 is x⇤1 = 1 with f1 (3) = 27
20
12
2
!3 "" 2 (or more)
3 IEDA 3010
2 Prescriptive Analytics
!4 "" 1 (or more)
3 Dynamic Programming 2
!5 1 0 (or # s3 $ 5)
Solution Procedure
LEARNING AIDS FOR THIS CHAPTER ON OUR WEBSITE 457
1 2
f2(s2, x2) ! ""f 3*(s2 " x2) # ""f 3*(s2 # x2)
3 3
• Therefore,
Optimal
x thepolicy
optimal policy is
2
n ! 2: s2 0 1 2 3 4 f 2*(s2) x2*
!
!0
!1
if 0
win,
0
x2* ! 1
0 ! if win,
if lose,
x3* ! 0 0
x3* ! 2 or 03.
—
—
4 4 4
*!
x!2 1 0 "" "" "" 1 or 2
1 9 9 9
!
(for x2* ! 1)
!3
2
""
if lose,
3
4
""
x92* !
2
""
1 3or 2 3
if2 win,
"" x * !
3 ""
2
!
2 or 3
0, 2, or 3
3 1, 2, 3, or 4 (for x2* ! 2)
!4
2
""
8
""
2
""
if
2 lose, 2
"" ""
bet is"8" lost 1
3 9 3 3 3 9
20
This policy gives the
statistician
– This policy
!5 1
gives the a statistician probability of "" of winning
27 a probability of 20/27 of
1 her bet#with
0 (or her colleagues.
s2 $ 5)
winning his bet with his classmates.

1 2
f1(s1, x1) ! ""f 2*(s1 " x1) # ""f 2*(s1 # x1)
3 3
x1
CLUSIONS
n ! 1: s 1 0 1 2 3 f 1*(s1) x1*
Dynamic
3 programming"" is a very
2
"" useful technique
20 2
"" for
"" making "
2 2a"
0sequence of interrelated
1
3 27 3 3 27
decisions. It requires formulating an appropriate recursive relationship for each individ-
13
ual problem. However, it provides a great computational savings over using exhaustive
IEDA 3010
Example 2: Determining Lot Size

• A company has received an order to supply one item of a
particular type.
• Due to the customer’s stringent requirement, the company
may have to produce more than one item to obtain an
acceptable item.
• The defect rate of the production is ½
– So the probability of producing no acceptable items in a lot of size
x is (½)x
14
IEDA 3010
Example 2: Determining Lot Size

• Producing one item costs $1K (even if defective)
• A setup cost of $3K is incurred for each production run
• The company has time to make at most three production
runs before the customer’s deadline
• If an acceptable item has not been obtained by the end of
the third production run, the cost to the company in lost
sales income and penalty costs will be $16K
• Goal: determine the policy regarding the lot size for the
required production run(s) that minimizes total expected
cost for the company
15
IEDA 3010
DP formulation
• DP formulation
– Stage n: production run n, n = 1,2,3
– Decision variable xn: lot size for stage n
– State sn: number of acceptable items still needed (1 or 0) at
beginning of stage n
– Cost function fn(sn, xn): total expected cost for stages n=1,
…, 3 if system starts in state sn at stage n, immediate
decision xn, and optimal decisions are made thereafter
– Setup cost K(xn): Unit is⇢thousand dollars
3, if xn > 0
K(xn ) =
0, if xn = 0
so the immediate cost at stage n is [K(xn) + xn]
16
IEDA 3010
DP formulation
• DP recursion
⇤
– Given sn and n, let xn denote any value of xn that minimizes
fn(sn, xn), and let fn⇤ (sn ) be the corresponding minimum value
of fn(sn, xn)
fn⇤ (sn ) = min fn (sn , xn ) = fn (sn , x⇤n )
xn =0,1,...
– fn⇤ (sn ) is the minimum expected cost for stages n=1, …, 3 if

system starts in state sn at stage n
⇤
– Obviously, fn (0, xn ) = 0 and thus fn (0) = 0 for any n
because if no acceptable item is needed then no production
is needed
17
IEDA 3010
DP formulation
CHAPTER 10 DYNAMIC PROGRAMMING
Probability Contribution
from stage n
0
K(xn)"xn
11 xxnn f*n"1(0) # 0
State:
Decision ()
1 $ (2)
2
1 xn xn
Value: fn(1, xn) x

(2)
1
K(xn)"xn
e for the
# K(xn)"xn" ()
1 n
f* (1)
2 n"1 1
ufacturing
f*n"1(1)
✓ ◆xn  ✓ ◆x n
1 ⇤ 1 ⇤
f (1, x ) =K(x ) + x +
Solution Procedure. The calculations
n n n n f n+1 (1) + 1 f n+1 (0)summa-
using this recursive 2relationship are
2
rized as follows. ✓ ◆xn
1 ⇤
=K(xn ) + xn + fn+1 (1)
2
18
x
IEDA 3010
DP formulation
– So the recursive relationship is ✓ ◆x n
⇤ 1 ⇤
fn (1) = min {K(xn ) + xn + fn+1 (1)}
xn =0,1,2,... 2
with terminal value f4⇤ (1) = 1.6
16 from the lost sale and penalty
for failing to deliver an acceptable product
19
IEDA 3010
Solution procedure
• n=3
are to make the lot size xn = 0, 1, 2, 3, … (an infinite
sequence). x3 = 0 : f3 (1, 0) = K(0) + 0 + (0.5)0 + f4⇤ (1) = 16
1
x3 = 1 : f3 (1, 1) = K(1) + 1 + (0.5) + f4⇤ (1) = 12
2
x3 = 2 : f3 (1, 2) = K(2) + 2 + (0.5) + f4⇤ (1) = 9
3
x3 = 3 : f3 (1, 3) = K(3) + 3 + (0.5) + f4⇤ (1) = 8
4
x3 = 4 : f3 (1, 4) = K(4) + 4 + (0.5) + f4⇤ (1) = 8
5
x3 = 5 : f3 (1, 5) = K(5) + 5 + (0.5) + f4⇤ (1) = 8.5
··· ··· ···
stage 3 is x⇤3 = 3 or x⇤3 = 4 with
20
f*n"1(1)
IEDA 3010
Solution procedure
Solution Procedure. The calculations using this recursive relationship are summa-
rized as follows.
1
! "
x3
f3(1, x3) ! K(x3) " x3 " 16 !!
2
x3
n ! 3: s3 0 1 2 3 4 5 f 3*(s3) x3*
0 0 0 0
1
1 16 12 9 8 8 8!! 8 3 or 4
2
1
! "
x2
f2(1, x2) ! K(x2) " x2 " !! f 3*(1)
2
x2
n ! 2: s2 0 1 2 3 4 f 2*(s2) x2*
0 0 0 0 21
IEDA 3010
Solution procedure
• n=2
– Suppose the current state is 1. The possible decisions are to
make the lot size xn = 0, 1, 2, 3, … (an infinite sequence).
0
x2 = 0 : f2 (1, 0) = K(0) + 0 + (0.5) + f3⇤ (1) = 8
1
x2 = 1 : f2 (1, 1) = K(1) + 1 + (0.5) + f3⇤ (1) = 8
2
x2 = 2 : f2 (1, 2) = K(2) + 2 + (0.5) + f3⇤ (1) = 7
3
x2 = 3 : f2 (1, 3) = K(3) + 3 + (0.5) + f3⇤ (1) = 7
4
x2 = 4 : f2 (1, 4) = K(4) + 4 + (0.5) + f3⇤ (1) = 7.5
··· ··· ···
– So the optimal decision given the current state 1 at stage 2
is x⇤2 = 2 or x⇤2 = 3 with f2⇤ (1) = 7
22
f3(1, x3) ! K(x3) " x3 " 16 !!
2 ! "
x3 IEDA 3010
n ! 3: s3 0 1 2 3 4 5 f 3*(s3)Prescriptive
x3*Analytics
0 0 0 0
1
1 16
Solution procedure
12 9 8 8 8!!
2
8 3 or 4
1
! "
x2
f2(1, x2) ! K(x2) " x2 " !! f 3*(1)
2
x2
n ! 2: s2 0 1 2 3 4 f 2*(s2) x2*
0 0 0 0
1
1 8 8 7 7 7!! 7 2 or 3
2
1
! "
x
f 2*(1)
1
f1(1, x1) ! K(x1) " x1 " !!
2
x1
n ! 1: s1 0 1 2 3 4 f 1*(s1) x1*
1 3 7 7 3 23
1 7 7!! 6!! 6!! 7!! 6!! 2
IEDA 3010
Solution procedure
• n=1
– The current state is 1. The possible decisions are to make
the lot size xn = 0, 1, 2, 3, … (an infinite sequence).
0
x1 = 0 : f1 (1, 0) = K(0) + 0 + (0.5) + f2⇤ (1) = 7
1
x1 = 1 : f1 (1, 1) = K(1) + 1 + (0.5) + f2⇤ (1) = 7 21
2
x1 = 2 : f1 (1, 2) = K(2) + 2 + (0.5) + f2⇤ (1) = 6 43
3
x1 = 3 : f1 (1, 3) = K(3) + 3 + (0.5) + f2⇤ (1) = 6 87
4
x1 = 4 : f1 (1, 4) = K(4) + 4 + (0.5) + f2⇤ (1) = 7 16
7
··· ··· ···

– So the optimal decision given the current state 1 at stage 1
is x⇤1 = 2 with f1⇤ (1) = 6 43
24
1
1 16 12 9 8 8 8!! 8 3 or 4
2 IEDA 3010
Solution procedure
1
! "
x2
*
f2(1, x2) ! K(x2) " x2 " !! f 3 (1)
2
x2
n ! 2: • s2 Optimal policy
0 1 2 3 4 f 2*(s2) x2*
– Produce two items on the first production run; if none is
0 0 0 0
acceptable, then produce either two or 1three items on the
1 8 8 7 7 7!! 7 2 or 3
second production run; if none is acceptable,
2 then produce
either three or four items on the third production run.
– The total expected cost for this policy is $6750.
1
! "
x
f 2*(1)
1
f1(1, x1) ! K(x1) " x1 " !!
2
x1
n ! 1: s1 0 1 2 3 4 f 1*(s1) x1*
1 3 7 7 3
1 7 7!! 6!! 6!! 7!! 6!! 2
2 4 8 16 4
25
IEDA 3010
Summary
• DP is a very useful technique for making a sequence
of interrelated decisions.
• It requires formulating an appropriate recursive
relationship for each individual problem.
• For stochastic DP models, state in the next stage is
not completely determined by the current state and
the immediate decision, but follows a probability
distribution.
– The cost or profit generated from future stages is random so
the expected cost or profit is optimized.
26

Part4 2 Stochastic Full

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Part4 2 Stochastic Full

Uploaded by

Copyright:

Available Formats

IEDA 3010

Basic structure for stochastic DP models

Example 1: Winning in Macau

– fn⇤ (sn ) is the maximum probability of finishing three plays

– So the recursive relationship

– So the optimal decision given the current state 3 at

winning his bet with his classmates.

Example 2: Determining Lot Size

Example 2: Determining Lot Size

– fn⇤ (sn ) is the minimum expected cost for stages n=1, …, 3 if

Value: fn(1, xn) x

··· ··· ···

You might also like