Chapter 19solutions Ch19

CHAPTER 19: MARKOV DECISION PROCESSES
19.2-1.
Bank One, one of the major credit card issuers in the United States has developed the
portfolio control and optimization (PORTICO) system to manage APR and credit-line
changes of its card holders. Customers prefer low APR and high credit lines, which can
reduce the bank's profitability and increase the risk. Consequently, the bank faces the
need to find a balance between revenue growth and risk. PORTICO formulates the
problem as a Markov decision process. The state variables are chosen in a way to satisfy
Markovian assumption as closely as possible while keeping the dimension of the state
space at a tractable level. The resulting variables are B C, where B corresponds to the
credit line and APR level and C represents the behavior variables. The transition
probabilities are estimated from the available data. The objective is to maximize the
expected net present value of the cash flows over a 36-month horizon. The dynamic
programming equation for the decision periods of the problem is
Z> B C max <B + C " ! :B + C 4Z>" B + 4,
+EBC
4f
where < denotes the immediate net cash flow and " is the discount factor. The
solution obtained is then adjusted to conform to business rules.
Benchmark tests are performed to evaluate the output policy. These tests suggest that the
new policy improves profitability. By adopting this policy, Bank One is expected to
increase its annual profit by more than $75 million.
19-1
19.2-2.
(a) Let the states 3 ! " # be the number of customers at the facility. There are two
possible actions when the facility has one or two customers. Let decision 1 be to use the
slow configuration and decision 2 be to use the fast configuration. Also let G34 denote the
expected net immediate cost of using decision 4 in state 3. Then,
G""
G"#
G!"
G!#
G#" $
G## *
$
*
$
&
%
&
&! #(
&! $"
(b) In state !, the configuration chosen does not affect the transition probabilities, so it is
best to choose the slow configuration when there are no customers in line. Consequently,
the number of stationary policies is four.
3
"
#
.3 V"
"
"
Policy
.3 V#
"
#
.3 V$
#
"
.3 V%
#
#
V"
Transition Matrix Expected Average Cost

"
"
# # !
$ " " G" $1! #(1" #(1#
V#
"
#
$
"!
V$
"
#
#
&
V%
"
#
#
&
"!
!
!
!
#
$
&
"
#
"
#
%
&
"
#
"
#
$
&
"
#
"
#
%
&
&
#
&
!
"
G# $1! #(1" $"1#
&
"
!
"
G$ $1! $"1" #(1#
"!
#
&
!
"
G% $1! $"1" $"1#
"!
"
&
&
(c)
Policy
V"
V#
V$
V%
1!
!$"!$
!$#%$
!%!')
!%"'
1"
!&"(#
!&%!&
!&!)&
!&"*
1#
!"(#%
!"$&"
!!)%(
!!'&
Average Cost
G" "('*
G# "()"
G$ "')$
G% "')(
G# is the minimum, so the optimal policy is V# , i.e., to use slow configuration when no
customer or only one customer is present and fast configuration when there are two
customers.
19-2
19.2-3.
(a) Let the states represent whether the student's car is dented, 3 ", or not, 3 !.
Decision
"
#
$
%
&
Action
Park on street in one space
Park on street in two spaces
Park in lot
Have it repaired
Drive dented
State
!
!
!
"
"
Immediate Cost
G!" !
G!# %&
G!$ &
G"% &!
G"& *
(b) Assuming the student's car has no dent initially, once she decides to park in lot, state
" will never be entered. In that case, the decision chosen in state " does not affect the
expected average cost. Hence, it is enough to consider five stationary deterministic
policies.
3
!
"
.3 V"
"
%
Policy
V"
V#
V$
V%
V&
.3 V#
"
&
.3 V$
#
%
Transition Matrix
!* !"
"
!
!* !"
!
"
!*) !!#
"
!
!*) !!#
!
"
" !

.3 V%
#
&
.3 V&
$
Expected Average Cost

G" !10 &!1"
G# !10 *1"
G$ %&10 &!1"
G% %&10 *1"
G & &1 0
(c)
Policy
V"
V#
V$
V%
V&
1!
!*!*
!
!*)
!
"
1"
!!*"
"
!!#
"
!
Average Cost
%&&
*
&%"
*
& (if initially not dented)
The policy V" has the minimum cost, so it is optimal to park on the street in one space if
not dented and to have it repaired if dented.
19-3
19.2-4.
(a) Let states ! and " denote the good and the bad mood respectively. The decision in
each state is between providing refreshments or not.
Decision
"
#
"
#
Action
Provide refreshments
Not provide refreshments
Provide refreshments
Not provide refreshments
State
!
!
"
"
Immediate Cost
G!" "%
G!# !
G"" "%
G"# (&
(b) There are four possible stationary policies.

3
!
"
.3 V"
"
"
Policy
V"
V#
V$
V%
.3 V#
"
#
.3 V$
#
"
Transition Matrix
!)(& !"#&
!)(& !"#&
!)(& !"#&
!"#& !)(&
!"#& !)(&
!)(& !"#&
!"#& !)(&
!"#& !)(&
.3 V%
#
#
G" "%1! "%1"
G# "%1! (&1"
G$ "%1"
G% (&1"
(c)
Policy
V"
V#
V$
V%
1!
!)(&
!&
!&
!"#&
1"
!"#&
!&
!&
!)(&
Average Cost
G" "%
G# %%&
G$ (
G% '&'#&
The optimal policy is V$ , i.e., to provide refreshments only if the group begins the night
in a bad mood.
19-4
19.2-5.
(a) Let state ! denote point over, two serves to go on next point and state " denote one
serve left. The decision in each state is to attempt an ace or a lob.
Decision
Action
State
"
Attempt ace
Attempt lob
"
Attempt ace
"
Attempt lob
"
Immediate Cost
G!" $) #$ " "$ " ")

G!# () "$ " #$ "
G"" $) #$ " "$ " &) "

G"# () "$ " #$ " ") "
(b) There are four possible stationary deterministic policies.

3
!
"
.3 V"
"
"
Policy
V"
V#
V$
V%
.3 V#
"
#
.3 V$
#
"
Transition Matrix
$) &)
"
!
$) &)
"
!
() ")
"
!
() ")
"
!
.3 V%
#
#

G" ")1! "#1"
G# ")1! &"#1"
G$ (#%1! "#1"
G% (#%1! &"#1"
(c)
Policy
V"
V#
V$
V%
1!
!'"&
!'"&
!))*
!))*
1"
!$)&
!$)&
!"""
!"""
(
#%
Average Cost
G" !#(!
G# !#$(
G$ !$"&
G% !$!'
The optimal policy is V$ , i.e., to attempt lob in state ! and ace in state ".
19-5
"
#
&
"#
19.2-6.
(a) Let states 3 ! " # represent the state of the market, "$ !!!, "% !!! and "& !!!
respectively. The decision is between two funds, namely the Go-Go Fund and the GoSlow Mutual Fund. All the costs are expressed in thousand dollars.
Decision
"
#
"
#
"
#
Action
Invest in the Go-Go
Invest in the Go-Slow
Invest in the Go-Go
Invest in the Go-Go
State
!
!
"
"
#
#
Immediate Cost
G!" !%#& !#'! ##
G!# !%"! !##& *
G"" !$#& !$'! "!&
G"# !$"! !$#& %&
G#" !"'! !%#& "'
G## !"#& !%"! '&
(b) There are eight possible stationary policies.

3
!
"
#
.3 V"
"
"
"
.3 V#
"
"
#
.3 V$
"
#
#
.3 V%
"
#
"
All V3 's have the same transition matrix:

Policy
V"
V#
V$
V%
V&
V'
V(
V)
!%
!$
!"
.3 V&
#
#
"
!%
!%
!%
.3 V'
#
"
#
!#
!$ .
!&
.3 V(
#
"
"
.3 V)
#
#
#

G" ##1! "!&1" "'1#
G# ##1! "!&1" '&1#
G$ ##1! %&1" '&1#
G% ##1! %&1" "'1#
G& *1! %&1" "'1#
G' *1! "!&1" '&1#
G( *1! "!&1" "'1#
G) *1! %&1" '&1#
(c) 1 !#&( !% !$%$

Policy
V"
V#
V$
V%
V&
V'
V(
V)
Average Cost
%$("
('#*
&##*
"*("
"$("
%#)'
"!#*
"))'
The optimal policy is V& , i.e. to invest in the Go-Go Fund in states ! and ", in the GoSlow Fund in state #.
19-6
19.2-7.
(a) Let states ! and " represent whether the machine is broken down or is running
respectively. The decision is between Buck and Bill.
Decision
"
#
"
#
Action
Buck
Bill
Buck
Bill
State
!
!
"
"
Immediate Cost
G!" !
G!# !
G"" "#!!
G"# "#!!
(b) There are four possible stationary deterministic policies.

3
!
"
.3 V"
"
"
Policy
V"
V#
V$
V%
.3 V#
"
#
.3 V$
#
"
Transition Matrix
!% !'
!' !%
!% !'
!% !'
!& !&
!' !%
!& !&
!% !'
.3 V%
#
#

G" "#!!1"
G# "#!!1"
G$ "#!!1"
G% "#!!1"
(c)
Policy
V"
V#
V$
V%
1!
!&
!%
!&%&
!%%%
1"
!&
!'
!%&&
!&&'
Average Cost
G" '!!
G# (#!
G$ &%'
G% ''(#
The largest expected average profit is given by V# .

19.2-8.
(a) Let the states be the number of items in inventory at the beginning of the period and
the decision be the number of items ordered. To conform to the software package, one
needs to relabel the decisions as " # $ respectively. The cost matrix is:
-35
!
"
#
"
%!$
%
%
#
&'$
"*
$
#%
Let V$ denote the policy to order # items when the inventory level is initially ! and not to
order when the inventory level is initially either ! or ". In other words, .! V$ $ and
." V$ .# V$ ".
19-7
T V$
"$
#$
"$
"$
"$
"$
"$
!
1 %* $* #*
"$
Expected average cost: %*G!$ $*G"" #*G#" ""'* $"#)*/period

(b) There are $$ #( stationary policies, since one can order ! " or # items in each state.
However, only six of these are feasible. The remaining #" policies are infeasible and the
decision at least in one of the states leads to over capacity.
3
!
"
#
.3 V"
"
"
"
.3 V#
#
"
"
.3 V$
$
"
"
.3 V%
"
#
"
.3 V&
#
#
"
.3 V'
$
#
"
19.3-1.
(a)
minimize
$C!" *C!# $C"" *C"# #)C#" $%C##
subject to
C!" C!# C"" C"# C#" C## "

C!" C!# "# C!" "# C!#
$
"! C""
#& C"# !
C"" C"# "# C!" "# C!# "# C"" "# C"# $& C#" %& C## !
#
C#" C## "!
C""
"
"! C"#
&# C#" &" C## !
C35 ! for 3 ! " # and 5 " #

(b) Using the simplex method, we find C!" !$#%$# C"" !&%!&% C## !"$&"% and
the remaining C35 's are zero. Hence, the optimal policy uses decision " in states ! and ",
decision # in state #.
19.3-2.
(a)
minimize
%&C!# &C!$ &!C"% *C"&
subject to
C!" C!# C!$ C"% C"& "

*
C!" C!# C!$ "!
C!"
"
C"% C"& "!
C!"
"
&! C!#
%*
&! C!#
C!$ C"% !
C"& !
C!" C!# C!$ C"% C"& !

(b) Using the simplex method, all C35 's turn out to be zero except that C!" !*!*!* and
C"% !!*!*", so the policy that uses decision " in state ! and decision % in state " is
optimal.
19-8
19.3-3.
(a)
minimize
"%C!" "%C"" (&C"#
subject to
C!" C!# C"" C"# "
C!" C!# () C!" ") C!# () C"" ") C"# !

C"" C"# ") C!" () C!# ") C"" () C"# !
C35 ! for 3 ! " and 5 " #
(b) Using the simplex method, we find C!# C"" !& C!" C"# !, so the optimal
policy is to use decision # in state ! and decision " in state ".
19.3-4.
(a)
minimize
") C!"
(
#% C!#
"# C""
&
"# C"#
subject to
C!" C!# C"" C"# "
C!" C!# $) C!" () C!# C"" C"# !

C"" C"# &) C!" ") C!# !
C35 ! for 3 ! " and 5 " #

(b) Using the simplex method, we find C!# !)))* C"" !"""" C!" C"# !, so the
optimal policy is to use decision # (lob) in state ! and decision " (ace) in state ".
19.3-5.
(a) minimize ##C!" *C!# "!&C"" %&C"# "'C#" '&C##
subject to C!" C!# C"" C"# C#" C## "
%
C!" C!# "!
C!"
%
C"" C"# "!
C!"
#
C#" C## "!
C!"
%
"! C!#
$
"! C""
$
"! C"#
"
"! C#"
%
"! C!#
%
"! C""
%
"! C"#
%
"! C#"
#
"! C!#
$
"! C""
$
"! C"#
&
"! C#"
"
"! C##
%
"! C##
&
"! C##
!
!
!
C35 ! for 3 ! " # and 5 " #

(b) Using the simplex method, we find C!" !#&( C"" !% C## !$%$ and the
remaining C35 's are zero. Hence, the optimal policy uses decision " (the Go-Go Fund) in
states ! and ", decision # in state # (the Go-Slow Fund).
19-9
19.3-6.
(a)
minimize
"#!!C"" "#!!C"#
subject to
C!" C!# C"" C"# "
C!" C!# !%C!" !&C!# !'C"" !%C"# !
C"" C"# !'C!" !&C!# !%C"" !'C"# !

C35 ! for 3 ! " and 5 " #
(b) Using the simplex method, we find C!" !% C"# !' C!# C"" !, so the
optimal policy is to use decision " (Buck) in state ! and decision # (Bill) in state ".
19.3-7.
(a) minimize
%!
$ C!"
&'
$ C!#
#%C!$ %C"" "*C"# %C#"
subject to C!" C!# C!$ C"" C"# C#" "
C!" C!# C!" #$ C!# "$ C!$ #$ C"" "$ C"# "$ C#" !
C"" C"# "$ C!# "$ C!$ "$ C"" "$ C"# "$ C#" !
C#" "$ C!$ "$ C"" "$ C"# "$ C#" !
C35 ! for 3 ! " # and 5 " # $
(b) Using the simplex method, we find C!$ !%%%% C"" !$$$$ C#" !#### and the
remaining C35 's are zero. Hence, the optimal policy is to order # items in state ! and not to
order in states " and #.
19-10
19.4-1.
19-11
19.4-2.
19-12
19.4-3.
19-13
19.4-4.
19-14
19.4-5.
19-15
19-16
19.4-6.
19-17
19.4-7.
19-18
19.4-8.
When the number of pints of blood delivered can be specified at the time of delivery, the
starting number of pints including the delivery will never exceed the largest possible
demand in a period, so we can restrict our attention to states 3 ! " # $. The admissible
actions in state 3 are to order ! 5 $ 3. Given a decision 5 , the transition
probabilities and the immediate cost are computed as follows:
:34 5 T H 3 5 4 if 4 "
:3! 5 T H 3 5
G35 &!5 I"!!3 5 H .
Initialization: .3 V" " for 3 ! " # and .$ V" !
!
!' !% !
*!
!$ !$ !% !
'!
PV"
GV"
!" !# !$ !%
&!
!" !# !$ !%
!
Iteration 1:
Step 1: Value determination:
1V" *! !'@! V" !%@" V" @! V"
1V" '! !$@! V" !$@" V" !%@# V" @" V"
1V" &! !"@! V" !#@" V" !$@# V" !%@$ V" @# V"
1V" ! !"@! V" !#@" V" !$@# V" !%@$ V" @$ V"
@$ V" !
1V" &() @! V" "*'$ @" V" ""&* @# V" &! @$ V" !
Step 2: Policy improvement:
minimize
"!! @! V" @! V" "!!
*! !'@! V" !%@" V" @! V" &()
""! !$@! V" !$@" V" !%@# V" @! V" #($'

"&! !"@! V" !#@" V" !$@# V" !%@$ V" @! V" ""&"
.! V# $
minimize
%! !'@! V" !%@" V" @" V" ))#%
'! !$@! V" !$@" V" !%@# V" @" V" &()
"!! !"@! V" !#@" V" !$@# V" !%@$ V" @" V" %"*"
." V# #
minimize
"! !$@! V" !$@" V" !%@# V" @# V" ($''

&! !"@! V" !#@" V" !$@# V" !%@$ V" @# V" &()
.# V# "
V# is not identical to V" , so optimality test fails.
19-19
Iteration #:
Step 1: Value determination:
1V# "&! !"@! V# !#@" V# !$@# V# !%@$ V# @! V#
1V# "!! !"@! V# !#@" V# !$@# V# !%@$ V# @" V#
1V# &! !"@! V# !#@" V# !$@# V# !%@$ V# @# V#
1V# ! !"@! V# !#@" V# !$@# V# !%@$ V# @$ V#
@$ V# !
1V# &! @! V# "&! @" V# "!! @# V# &! @$ V# !
Step 2: Policy improvement:
minimize
"!! @! V# @! V# "!!
*! !'@! V# !%@" V# @! V# (!
""! !$@! V# !$@" V# !%@# V# @! V# &&

"&! !"@! V# !#@" V# !$@# V# !%@$ V# @! V# &!
.! V$ $
minimize
%! !'@! V# !%@" V# @" V# (!
'! !$@! V# !$@" V# !%@# V# @" V# &&

"!! !"@! V# !#@" V# !$@# V# !%@$ V# @" V# &!
." V$ #
minimize
"! !$@! V" !$@" V" !%@# V" @# V" &&

&! !"@! V" !#@" V" !$@# V" !%@$ V" @# V" &!
.# V$ "
V$ is identical to V# , so it is optimal to start every period with $ pints of blood after
delivery of the order.
19.5-1.
Let states !, " and # denote $'!!, $)!! and $"!!! offers respectively and let state $
designate the case that the car has already been sold (state _ of the hint). Let decisions "
and # be to reject and to accept the offer respectively.
G!" G"" G#" '!, G!# '!!, G"# )!! and G## "!!!
&)
&)
T "
&)
!
"%
"%
"%
!
")
")
")
!
!
!
!
!
T #
!
!
"
!
!
!
!
!
!
!
!
!
"
"
"
"
Start with the policy to reject only the $'!! offer. The relevant equations are:
Z! '! !*& &) Z! "% Z" ") Z#
Z" )!! !*&Z$
Z# "!!! !*&Z$
Z$ !*&Z$ ,
19-20
which admit the unique solution Z! Z" Z# Z$ (*'!"$ )!! "!!! !.

Policy improvement:
State ! with decision #: '!! !*&Z$ '!! Z!
State " with decision ": '! !*&&)Z! "%Z" ")Z# (*'!"$ Z"
State # with decision ": '! !*&&)Z! "%Z" ")Z# (*'!"$ Z#
Hence, the policy to reject the $'!! offer and to accept $)!! and $"!!! offers is optimal.
19.5-2.
(a) minimize '!C!" '!!C!# '!C"" )!!C"# '!C#" "!!!C##
subject to
C!"
C""
C#"
C!# !*& &) C!" C"" C#"

C"# !*& "% C!" C"" C#"
C## !*& ") C!" C"" C#"
"
$
"
$
"
$
C35 ! for 3 ! " # and 5 " #

(b) Using the simplex method, we find C!" !)"*(* C"# !&#(( C## !%$!&' and
the remaining C35 's are zero. Hence, the optimal policy is to reject the $'!! offer and to
accept the $)!! and $"!!! offers.
19.5-3.
Z38 min'! !*&&)Z!8" "%Z"8" ")Z#8" offer for 3 ! " #
Z3! ! for 3 ! " #
Iteration 1:
Z3" min'! offer offer for 3 ! " # Accept
Iteration 2:
Z!# min'!& '!! '!& Reject

Z"# min'!& )!! )!! Accept
Z## min'!& "!!! "!!! Accept
Iteration 3:
Z!$ min'!(*( '!! '!(*( Reject

Z"$ min'!(*( )!! )!! Accept
Z#$ min'!(*( "!!! "!!! Accept
The approximate optimal solution is to reject the $'!! offer and to accept the $)!! and
$"!!! offers. This policy is indeed optimal, as found in Problem 19.5-1 and 19.5-2.
19.5-4.
Let states !, " and # denote the selling price of $"!, $#! and $$! respectively and let state
$ designate the case that the stock has already been sold. Let decisions " and # be to hold
and to sell the stock respectively.
G!" G"" G#" !, G!# "!, G"# #! and G## $!
19-21
%&
"%
T "
!
!
"&
"%
$%
!
!
"#
"%
!
!
!
!
!
T #
!
!
"
!
!
!
!
!
!
!
!
!
"
"
"
"
Start with the policy to sell only when the price is $$!. The relevant equations are:
Z! ! !* %& Z! "& Z"
Z" ! !* "% Z! "% Z" "# Z#

Z# $! !*Z$
Z$ ! !*Z$ ,
which admit the unique solution Z! Z" Z# Z$ %)'!$&$ (&'!$&$ $! !.
Policy improvement:
State ! with decision #: "! !*Z$ "! Z!
State " with decision #: #! !*Z$ #! Z"
State # with decision ": ! !*$%Z" "%Z# #"#" Z#
Hence, the policy to hold the stock when the price is $"! and $#!, and to sell it when the
price is $$!.
19.5-5.
(a) minimize
"!C!# #!C"# $!C##
subject to C!" C!# !* %& C!" "% C""
C"" C"# !* "& C!" "% C"" $% C#"

C#" C## !* "# C"" "% C#"
"
$
"
$
"
$
C35 ! for 3 ! " # and 5 " #

(b) Using the simplex method, we find C!" "*'!&* C"" !*&)&" C## !('%'$ and
the remaining C35 's are zero. Hence, the optimal policy is to hold the stock at the prices
$"! and $#! and to sell it at the price $$!.
19.5-6.
Z!8 min!*%&Z!8" "&Z"8" "!
Z"8 min!*"%Z!8" "%Z"8" "#Z#8" #!
Z#8 min!*$%Z"8" "%Z#8" $!
Z3! ! for 3 ! " #
Iteration 1:
Z!" min! "! "! Sell

Z"" min! #! #! Sell
Z#" min! $! $! Sell
19-22
Iteration 2:
Z!# min"!) "! "!) Hold

Z"# min#!#& #! #!#& Hold
Z## min#!#& $! $! Sell
Iteration 3:
Z!$ min""%# "! ""%# Hold

Z"$ min#!%* #! #!%* Hold
Z#$ min#!%# $! $! Sell
The approximate optimal solution is to sell if the price is $$! and to hold otherwise. This
policy is indeed optimal, as found in Problem 19.5-3 and 19.5-4.
19.5-7.
(a) Let states ! and " be the chemical produced this month, G" and G# respectively, and
decisions " and # refer to the process to be used next month, E and F respectively. There
are four stationary deterministic policies.
3
!
"
.3 V"
"
"
.3 V#
"
#
.3 V$
#
"
.3 V%
#
#
The transition matrix is the same for every decision, viz.

!$
T
!%
!(
.
!'
The costs G35 correspond to the expected amount of pollution using the process 5 in the
next period.
G!" !$"& !(# &*
G!# !$$ !() '&
G"" !%"& !'# (#
G"# !%$ !') '.
(b)
19-23
19.5-8.
(a) minimize &*C!" '&C!# (#C"" 'C"#
subject to
C!"
C""
$
C!# "# "!
C!"
(
C"# "# "!
C!"
%
"! C""
$
"! C!#
'
"! C""
(
"! C!#
%
"! C"#
'
"! C"#
"
#
"
#
C35 ! for 3 ! " and 5 " #

(b) Using the simplex method, we find C!" !)&( C"# ""%$ and C!# C"" !.
Hence, the optimal policy is to use process E if G" is produced and F if G# is produced
this month.
19-24
19.5-9.
19.5-10.
The three iterations of successive approximations in Problem 19.5-9 gives the optimal
policy for the three-period problem. The optimal policy is, therefore, to use the process E
if G" is produced and F if G# is produced in all periods.
19-25
19.5-11.
Z!8 min! !*!()Z"8" ""'Z#8" ""'Z$8" %!!! !*!Z"8" '!!! !*!Z!8"
Z"8 min"!!! !*!$%Z"8" ")Z#8" ")Z$8" %!!! !*!Z"8" '!!! !*!Z!8"
Z#8 min$!!! !*!"#Z#8" "#Z$8" %!!! !*!Z"8" '!!! !*!Z!8"
Z$8 '!!! !*!Z!8"
Z3! ! for 3 ! " # $

Iteration 1:
Z!" min! %!!! '!!! ! Do nothing

Z"" min"!!! %!!! '!!! "!!! Do nothing
Z#" min$!!! %!!! '!!! $!!! Do nothing
Z$" '!!! Replace
Iteration 2:
Z!# min"#*$(& %*!! '!!! "#*$(& Do nothing

Z"# min#')(& %*!! '!!! #')(& Do nothing
Z## min(!&! %*!! '!!! %*!! Overhaul
Z$# '!!! Replace
Iteration 3:
Z!$ min#(#*&$ '%")(& ("'%$) #(#*&$ Do nothing

Z"$ min%!%!$" '%")(& ("'%$) %!%!$" Do nothing
Z#$ min(*!& '%")(& ("'%$) '%")(& Overhaul
Z$$ ("'%$) Replace
Iteration 4:
Z!% min$*%&)! ('$'#) )%&'&) $*%&)! Do nothing

Z"% min&#&&$" ('$'#) )%&'&) &#&&$" Do nothing
Z#% min*""#%" ('$'#) )%&'&) ('$'#) Overhaul
Z$% )%&'&) Replace
The optimal policy is to do nothing in states ! " and to replace in state $ in all periods.
When in state #, it is best to overhaul in periods " # $ and to do nothing in period %.
19-26

Chapter 19solutions Ch19

Uploaded by

Copyright:

Available Formats

You might also like

Chapter 19solutions Ch19

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 19solutions Ch19

Uploaded by

Copyright:

Available Formats

CHAPTER 19: MARKOV DECISION PROCESSES

Transition Matrix Expected Average Cost

Expected Average Cost

(b) There are four possible stationary policies.

G!" $) #$ " "$ " ")

G"" $) #$ " "$ " &) "

(b) There are four possible stationary deterministic policies.

Expected Average Cost

(b) There are eight possible stationary policies.

All V3 's have the same transition matrix:

Expected Average Cost

(c) 1 !#&( !% !$%$

(b) There are four possible stationary deterministic policies.

Expected Average Cost

The largest expected average profit is given by V# .

Expected average cost: %*G!$ $*G"" #*G#" ""'* $"#)*/period

$C!" *C!# $C"" *C"# #)C#" $%C##

C!" C!# C"" C"# C#" C## "

&# C#" &" C## !

C35 ! for 3 ! " # and 5 " #

%&C!# &C!$ &!C"% *C"&

C!" C!# C!$ C"% C"& "

C!" C!# C!$ C"% C"& !

"%C!" "%C"" (&C"#

C!" C!# C"" C"# "

C!" C!# () C!" ") C!# () C"" ") C"# !

C!" C!# C"" C"# "

C!" C!# $) C!" () C!# C"" C"# !

C35 ! for 3 ! " and 5 " #

C35 ! for 3 ! " # and 5 " #

C!" C!# C"" C"# "

C!" C!# !%C!" !&C!# !'C"" !%C"# !

C"" C"# !'C!" !&C!# !%C"" !'C"# !

#%C!$ %C"" "*C"# %C#"

subject to C!" C!# C!$ C"" C"# C#" "

"!! @! V" @! V" "!!

*! !'@! V" !%@" V" @! V" &()

""! !$@! V" !$@" V" !%@# V" @! V" #($'

%! !'@! V" !%@" V" @" V" ))#%

"! !$@! V" !$@" V" !%@# V" @# V" ($''

""! !$@! V# !$@" V# !%@# V# @! V# &&

%! !'@! V# !%@" V# @" V# (!

'! !$@! V# !$@" V# !%@# V# @" V# &&

"! !$@! V" !$@" V" !%@# V" @# V" &&

which admit the unique solution Z! Z" Z# Z$ (*'!"$ )!! "!!! !.

C!# !*& &) C!" C"" C#"

C35 ! for 3 ! " # and 5 " #

Z3" min'! offer offer for 3 ! " # Accept

Z!# min'!& '!! '!& Reject

Z!$ min'!(*( '!! '!(*( Reject

Z" ! !* "% Z! "% Z" "# Z#

"!C!# #!C"# $!C##

subject to C!" C!# !* %& C!" "% C""

C"" C"# !* "& C!" "% C"" $% C#"

C35 ! for 3 ! " # and 5 " #

Z!" min! "! "! Sell

Z!# min"!) "! "!) Hold

Expected average cost: %G!$ $G"" #G#" ""' $"#)*/period

$C!" C!# $C"" C"# #)C#" $%C##

Z!$ min'!(( '!! '!(( Reject

Z!# min"#$(& %!! '!!! "#*$(& Do nothing

Z!$ min#(#&$ '%")(& ("'%$) #(#&$ Do nothing

Z!% min$%&)! ('$'#) )%&'&) $%&)! Do nothing