Download as pdf or txt
Download as pdf or txt
You are on page 1of 96

Exercises in

SF2863 Systems Engineering

2015

Division of Optimization and Systems Theory


Department of Mathematics
Kungliga Tekniska Högskolan
Stockholm, Sweden
Contents 2

Contents

1. Markov Chains 3
1.1 Markov Chains in discrete time . . . . . . . . . . . . . . . . . . . . . 3
1.2 Markov Chains in continuous time . . . . . . . . . . . . . . . . . . . 6

2. Queueing theory 9

3. Inventory theory 16

4. Marginal Allocation 21

5. Dynamic programming 23
5.1 Deterministic Dynamic Programming . . . . . . . . . . . . . . . . . 23
5.2 Stochastic Dynamic Programming . . . . . . . . . . . . . . . . . . . 25

6. Markov Decision Processes 30

7. Solutions to the exercises 35


1. Markov Chains 3

1. Markov Chains

1.1. Markov Chains in discrete time


1.1 A Markov chain with the states {0, 1, 2} has the transition matrix
 
1/3 1/3
P=  1/2 0 
3/4 0

a) Fill in the blanks in the transition matrix above.

b) Determine the 2-step transition matrix, i.e., P(2) .

c) Assume that the Markov chain has the starting vector p(0) = (1/2, 1/2, 0).
(2) (2) (2)
Determine the absolute probabilities p(2) = (p0 , p1 , p2 ).

1.2 Let (Xn : n ≥ 0) be a Markov chain with the states E = {r, w, b, y} and
transition matrix  
0 0 1 0
 0 0.4 0.6 0 
P=  0.8 0 0.2 0 

0.2 0.3 0 0.5

a) Determine

P (X5 = b, X6 = r, X7 = b, X8 = b | X4 = w)

b) Determine
E(f (X5 )f (X6 ) | X4 = y)
where f is defined as 

 2 x=r


4 x=w
f (x) =

 7 x=b


3 x=y

1.3 A Markov chain {Xn ; n ≥ 0} has the states {1, 2, 3} and the following transi-
tion matrix  
0.2 0.3 0.5
P =  0.4 0.2 0.4 
0.3 0.6 0.1

a) Determine P (X5 = 3 | X3 = 1, X2 = 1).

b) Determine P (X8 = 3, X7 = 1, X5 = 2 | X3 = 2, X2 = 1).


1. Markov Chains 4

1.4 Determine all stationary distributions P to the Markov chains with transition
matrices below.
   
0.4 0 0.6 1/2 1/2 0
a) P =  0 0.5 0.5  b) P =  1/4 3/4 0 
0.25 0.75 0 0 0 1

1.5 Classify the Markov chains with transition matrices below, i.e., determine if
they irreducible, periodic, and if the states are transient states or not.

   
0 3/4 1/4 0 0.6 0 0.4 0
 1/2 0 0 1/2   0 0.3 0 0.7 
a) P = 
 1/5 0
 b) P =  
0 4/5   0.5 0 0.5 0 
0 1/3 2/3 0 0 0.8 0 0.2

1.6 In the two urns A and B there are three red and two green balls. One ball is
drawn from the urn containing three balls and it is placed in the other urn.
Define

Xn = the number of green balls in the urn that after n draws contains two balls
n = 1, 2, . . .
X0 =2

and

(
1 if the ball drawn in the n:th draw is red,
Yn =
2 if the ball drawn in the n:th draw is green

Y0 is arbitrary.

a) Show that (Xn : n ≥ 0) is a Markov chain with the states {0, 1, 2}.

b) Determine the initial probabilities and the transition matrix P.

c) Is (Yn : n ≥ 0) a Markov chain?

d) Determine P (Xn = j) numerically for j = 0, 1, 2, n = 0, 1, 2, 3.

e) Determine the stationary distribution for P.

1.7 Markov studied the sequences of vowels and consonants in russian poems. Sim-
ilar studies can of course be done in swedish litterature. In “En Herrgårdssägen”
by Selma Lagerlöf, a vowel is followed by a consonant in 78.7 % of the cases
and by a word separator (a blank, period, comma etc) in 20.9 % of the cases.
If we consider ”vowel”, ”consonant” and ”word separator” as three states in
1. Markov Chains 5

a Markov chain and assume transition probabilities corresponding to the ob-


served frequencies in the above mentioned novel, the transition matrix
 
0.004 0.787 0.209
P =  0.483 0.271 0.246 
0.238 0.762 0

is obtaine.
It is of course not possible to exactly model the written language as a Markov
chain, but certain aspects of the language can be studied with such a model.
Assume that a “novel” is written as a sequence of vowels, consonants and word
separators {Xn ; n ≥ 1}, with the above transition probabilities. The first sign,
X1 , is chosen by random according to the stationary distribution.

a) Motivate, without performing any calculation, that an unique stationary


distribution exists. Then determine it.

b)Determine the probability that a word starts with a consonant.

c) Determine the probability that a word ends with a vowel.

d) Determine the average word length.

e) Determine the average number of vowels in a word and the average number
of consonants in a word.

1.8 In a signal system, zeros and ones are transmitted. A one is transmitted with
probability p and a zero with probability q = 1−p. The signals are indpendent
of each other.
Let Xn be the n:th transmitted signal, n = 1, 2, . . . . The first signalen is a
zero, X1 = 0.
Define the following states:
S1 ={the last two signals are zeros},
S2 ={the last two signals is a zero and a one},
S3 ={the last two signals is a one and a zero} and
S4 ={the last two signals are ones}.
Let {Yn ; n ≥ 2} be the stochastic process that defines which of the states the
signal process is in.

a) Motivate that {Yn ; n ≥ 2} is a Markov chain and determine that transition


matrix.

b) Determine the expected number of transmitted signals (including the first


zero) until two ones in a row are transmitted.

1.9 A Markov chain with the states {1,2,3} in discrete time has the transition
matrix
1. Markov Chains 6

 
0.2 0.5 0.3
 0.4 0.3 0.3 
0.3 0.2 0.5
a) The chain starts in state 1. Determine the expected time until it ends up
in state 3.

b) Determine the probability that the chain did not reach state 3 after 4 time
steps.

c) Show that an asymptotic distribution exists and determine it.

1.10 To the joy of all children (but maybe to the despair of parents) it is common to
distribute collectors items with some products. Consider the following setup.
Each product comes with two different collector items. These can be considered
chosen randomly from a series of four different items. Let Xn = be the number
of different items you have after buying n products. The sequence {Xn } is a
Markov chain.
a) Determine the transition matrix.

b) If each product costs 15 SEK, determine the expected value of the cost until
all four different collectors items are obtained

1.2. Markov Chains in continuous time

1.11 A Markov process X(t); t ≥ 0 has the states {0, 1, 2} and the following intensity
matrix. The process starts in 2, X(0) = 2.
 
−8 4
 −5 2 
0 2

a) Fill in the blanks in the matrix.

b) Determine the probability that the process remains in state 2 during the
whole time interval [0, 1].

c) Determine the probability that the second jump of the chain goes to state
0.
d) Determine the system of differential equations from which you can solve for
pi (t) = P (X(t) = i), i = 0, 1, 2.

e) Determine the expected time until the process for the first time goes to
state 0.

f) Motivate that an asymptotic distribution exists and determine it.


1. Markov Chains 7

1.12 A Markov process has the generator (intensity matrix)


 
−1 1 0
Q =  1 −3 2  .
0 2 −2

a) Determine the transition matrix P̃.

b) Determine if the chain is ergodic and determine in that case the asymptotic
distribution p.

1.13 A Markov process with the state space {1, 2, 3, 4} has the following intensity
 
matrix −8 2 2 4
 3 −6 2 1 
 
 0 0 0 0 
0 0 0 0
The states 3 and 4 are hence absorbing.

a) Determine the probability that the chain is absorbed in state 3 and 4,


respectively, when starting in state 1.

b) Determine hte expected time to absorption when starting in state 1.

1.14 {X(t); t ≥ 0} is a Markov process with the state space {0,1,2} and X(0) = 0.
The intensity matrix is
 
−3 1 2
Q =  4 −10 6 
1 4 −5

a) Determine the expected time until the process for the third time returns to
the state 0.

b) Determine the expected time the process has spent in state 1 during this
time.

1.15 A series system with two components breaks down as soon as on of the com-
ponents break. The components breaks after a time that is exponentially dis-
tributed, Exp(1/400) (hours) independently on if the other component works
or not. A broken component is repaired and the repairtime for a component
is Exp(1/20). There are two repairmen available, so both components can be
repaired simultaneously if both are broken at the same time. All times are
independent of each other.
Determine the asymptotic avaialability of the system, i.e., the probability that
the system works after a long time.

1.16 A machine can be in three states, {1,2,3}. In state 1 the machine is perfekt and
generates an income of 100 000 SEK/year. In state 2 the machine is partially
broken and works at reduced speed. Then it generates an income of 40 000
SEK/year. In state 3 it is completely broken and generate no income. In this
1. Markov Chains 8

state the machine is replaced by a new one. In state 2 an attempt to repair


the machine is made and the transitions between the states occur according
to a Markov process with intensity matrix (unit: per year)
 
−8 7 1
 32 −36 4 
100 0 −100
a) Determine the life expectancy of a machine.

b) Determine the expected income of a machine during its life expectancy.


1.17 Let {X(t); t ≥ 0} be a Markov process with the states 1,2,3 and intensity
matrix  
−4 1 3
Q =  3 −7 4 
3 0 −3
The process starts in state 3, X(0) = 3. Determine P (X(t) = i), i = 1, 2, 3.
1.18 Cars are arriving to a tunnel according to a Poisson process with the intensity
2 cars per minute. The cars are driving 60 km/hour, and the tunnel is 1 km
long.
Determine the probability that at a time t there are at most 3 cars in the
tunnel.
1.19 Clients arrive at a service station according to a Poisson process with intensity
λ. Sow that the probability for an even number of clients to arrive during
the time interval (s, s + t] is 21 (1 + e−2λt ) and that the probability for an even
number of clients to arrive is 12 (1 − e−2λt ).
1.20 (The Urn model of Ehrenfest, continuous time). In two urns, A and B, there
are a total of N particles. Each of the particles are transfered, independently
of each other, to the other urn with intensity λ. Let X(t) be the number of
particles in urn A at time t.
Motivate why {X(t); t ≥ 0} is an ergodic Markov process and determine the
asymptotic distribution. If urn A contains only one particle, how long is the
expected time until container A is empty ?
1.21 Consider a “cyclic queue” in steady state, where m clients circulate between
two service systems.
New clients are arriving to the system with intensity λ. The clients first arrive
at system 1, where the service time Exp(µ̃1 )-distributed and then they are
passed on to system 2 where the service time is Exp(µ̃2 )-distributed. After
the client leaves system 2 they leave the queing system with probability p, or
return to system 1 with probability 1 − p. All stochastic variables involved
are assumed to be independent, so in particular the probability of leaving the
system after passing through system 2 is independent of everything else.
Determine
pk,n = P (k clients in system 1 and n clients in system 2).
2. Queueing theory 9

2. Queueing theory

2.1 Frasse the bum has, after a series of twists of fate, become partner in a small
travelling carnival. The carnival has two main attractions, namely one carousel
and one “ghost train”. In addition to this there is a tent in which one can get
one’s portrait drawn. To this carnival a modest amount of children arrive (4
children every hour) and we assume for simplicity that every child is accom-
panied by one generous and very loving parent. The sole purpose of the adult
in this context is to supply the child with money so in what follows we will
focus the discussion on the child.
The carousel is run by Frasse himself while the ghost train is run by his brother-
in-law, Heathcliff. None of these gentlemen is particularly inclined towards
bureaucratic procedures and therefore a trip in either the carousel or the ghost
train can be considered to be of exponential distribution. The expected time
for a trip in the carousel is 2.5 minutes on the other hand, the ghost train
can serve 13 children every hour. After a round in the carousel a randomly
selected child will be ill and will want to go home with probability 0.1. With
probability 0.4 the child insists on another round and with probability 0.4 the
kid will proceed to the ghost train. The ghost train is not quite that popular,
and therefore the child will want to return to the carousel with probability 1
after a ride on the ghost train. The only thing that keeps the spirit up for
severely tried adult is the knowledge that with probability 0.1 the child will
want to have its portrait drawn after a ride in the carousel. This is the only
point where the adult has a say in the turn of events, which more precisely
means that after a trip to the artist’s tent the next stop will be at home.
The portraits are drawn by H:s friend Cathy who manages to (exponentially
distributed) draw a portrait in fifteen minutes on average. We assume that
only the child’s portrait will be drawn. Compute the average time that a child
will spend at the carnival.

2.2 In order to improve his finances in the Christmas time Frasse the ex-bum has,
together with his brother-in-law Heathcliff and the latter’s uncle by marriage
Ludwig W, started a very small gambling establishment. In this establishment
there is a roulette table (R), a bar (B) and a kiosk (K). On average, twenty
customers arrive each hour to the gambling establishment and all of these
go directly to the roulette table where Frasse is the croupier. The time for
one single visit at R can be assumed to be exponentially distributed with an
expected value of 5/3 minutes (one tires easily). After a stay at the roulette
half the gamblers head for home without any further contact with either the bar
or the kiosk, while the rest of the roulette players go to the bar to contemplate
over the impermanence of things. The bar is run by H who manages to serve
an average of 18 customers per hour. After a visit to the bar a third of
the customers return to the roulette table to once more try their luck, while
two thirds have had enough of this immoral behaviour and head home. On
their way to the front door they pass the kiosk which makes 50% realize that
they need to strengthen themselves with some aspirin. The aspirin is sold by
Ludwig W who charges 10 kronor per pill, which is outrageous. On the other
hand, Ludwig is remarkably fast, so the time to purchase an aspirin may be
2. Queueing theory 10

considered negligible.

(a) Compute the steady-state probability distribution for the number of peo-
ple at the roulette table and the bar respectively.
(b) Compute the average time a randomly selected customer spends in the
gambling establishment.
(c) How many kronor per hour does the three partners earn on the aspirin
sales?

2.3 Two different kinds of customers, type A and type B, arrive at a queueing
system. The customers of type A, who arrive with a mean time between
arrivals of 10 minutes are immediately directed to serving station I. In this
station there is a server which on average manages to serve 12 customers per
hour. When a customer leaves station I he will with probability 2/3 be sent
on his way to service station II, but with probability 1/3 he will be sent back
to station I. The B-customers arrive with intensity 9 customers per hour and
are sent directly to station II. At station II there is a single server which has
the capacity to serve 18 customers per hour and after completed service in
station II both A- and B-customers are sent out of the system. Thus, both A-
and B-customers arrive at station II, but B-customers have total priority over
A-customers.

(a) Compute the probability distribution for the number of customers at


station II.
(b) Compute the average time a randomly selected A-customer spends in the
system.

2.4 Consider a queueing system consisting of the two stations I and II. To station
I there arrives on average one customer every six minutes, while the corre-
sponding data for station II is twelve customers every hour. In each of the two
stations there is a single server, both with an average service intensity of 22
customers per hour. A customer who has been served by station I will with
probability 1/3 be allocated to station II where he will be placed last in line
and with probability 2/3 be thrown out of the system. A customer who has
been served by station II will with probability 1/2 be allocated to station I
where he will be placed last in line and with probability 1/2 be thrown out of
the system. We assume that the arrivals form a Poisson process and that the
service time distributions are exponential.

(a) Assume that you have just arrived at station I. Which is then the expected
remaining time before you leave the system?
(b) The customers that are kicked out of the system will all go on to station
III, which is manned by a single server with a service capacity of four
customers in ten minutes. However, here those customers who have been
thrown out from station I have total priority over those that have been
thrown out from station II. Compute the average waiting time at station
III for a customer who has been thrown out from station II.
2. Queueing theory 11

2.5 Four different departments each own one supercomputer of the brand Giant.
They each have one computer technician employed who quickly does a search
for the error and restarts the computer when it crashes. The time a computer
works before it crashes is exponentially distributed with the same expected
value, 120 hours, for the four machines. The machines are also independent of
each other. The time it takes a computer technician to search for errors and
restart a computer is also exponentially distributed, but with expected value
12 hours. The computer technicians work independently.

(a) The requirements that are set up by the departments is that the expected
downtime for a crashed machine should not be more than 13 hours. Fur-
ther, the time that two or more of the four machines are down simul-
taneously must not exceed 5% of the total time, since there is a certain
cooperation between the departments. Does the system with one techni-
cian per department satisfy these requirements?
(b) After a while, the departments realize that it might be possible to save
money on cooperation. By the time two of the technicians retire one
considers the possibility to let the two remaining technicians serve all
four machines together. Since the machines and people are the same as
before it is expected that the error intensity and the time to repair is
unchanged. (Still only one technician can serve a machine that is down.)
The requirements are the same as before. Does the new system fulfil the
requirements?
(c) In addition to this it is required, since scientists are impatient people,
that if a computer is down and can not be serviced at once due to a
queue in the system, the expected time before it receives service must
not exceed four hours. Is this requirement fulfilled with the new system?

2.6 Mr and Mrs S run a small hotel. Their hotel has room for four guests (in
one single room each). The guests arrive (one by one) according to a Poisson
process with an average of one guest every five days. Each individual guest
stays at the hotel for a time that is exponentially distributed with an expected
value of 15 nights. If the hotel is full when a new guest arrives this guest will
immediately leave for another hotel. The price for a room is 400 kronor per
night. The running cost (breakfast, cleaning, laundry etc.) paid by Mr and
Mrs S is 100 kronor per guest and night. Their profit per night is thus 300
kronor. (This money shall in the long run cover the cost of e.g. the house, and
also give them a reasonable income to add to their pension.)
One day a very different potential guest calls and leaves the following message:
“I wish to rent a room during a whole year under the condition that you give
me a 150 kronor discount per night, i.e. that you let me rent the room for 250
kronor per night. This offer is not negotiable, give me a ’yes’ or a ’no’.”
Formulate an appropriate mathematical model and compute whether Mr and
Mrs S should accept the persons offer. We assume that they want maximize
their expected profit for the coming year.
2.7 At the regiment X the join up for military service has begun. To the supply
shed there arrive on average 24 (enlisted) soldiers every hour. Every soldier
2. Queueing theory 12

has his own completed requisition form and before any material is handed out
the form is reviewed by a sergeant. The sergeant reads (and thinks) very fast,
so this procedure can be viewed as a service station with an intensity of 105
forms an hour. It turns out that, on average, 20% of the forms are completed
incorrectly. The unlucky holder of an incorrectly filled out form is required
to complete a new one and (naturally) place himself at the end of the line.
This procedure is assumed to take a negligible amount of time (and the risk
that the new form is completed incorrectly is still 20%). When one finally has
been deemed to have a correctly completed form the soldier ends up in the
proper supply shed queue. The supply shed is manned by a master sergeant
who delivers a complete military gear in one and a half minutes (on average).
Totally exhausted by the above ordeal the soldier (in no-time) walks to the
kiosk situated right outside the gate of the regiment. This kiosk is manned by
the not completely unknown (ex-bum) Frasse, who serves a hot dog with an
average speed of 44 hot dogs every hour. Every now and then, a lieutenant
arrives at the kiosk with an average time between arrivals of seven and a half
minutes. The queueing system is governed by Frasse, who without pardon lets
all the enlisted soldiers pass the lieutenants in the line.

(a) Compute the average time spent in line for enlisted soldiers (supply shed
+ kiosk) as well as for lieutenants (kiosk).
(b) To a completely different queueing system (whose true nature can not
be revealed for national security reasons) close to the regiment there
arrive on average four customers every hour. The customers are served
by a single server with an Erlang(1/6, 2)-distributed service time. (The
average service time is thus twenty minutes per customer.) The system
has room for exactly two customers (including the one being served) and
customers who arrive when the system is full will leave, never to return.
Compute the probability distribution for the number of customers in the
system.

2.8 (a) A queueing system is composed of two stations. Customers of type A,


B and C arrive at station I according to independent Poisson processes.
The intensities for the respective processes are λA =5, λB =3 and λC =2
(customers per hour). Station I is manned by a server that on average
needs 3 minutes to serve a customer (the service time is exponentially
distributed). The service time does not depend on which type of customer
being served. The order of priorities is that A-customers supersede B-
customers who supersede C-customers. If a customer of lower priority is
served when a customer of higher priority enters the system the service of
the lower priority customer is stopped until the higher priority customer
has been served. Compute the average waiting times for the different
categories of customers.
(b) After being served at station I the C-customers leave the system, while
the A- and B-customers immediately continue to station II where they are
served without any consideration of priority order. This station has two
parallel servers, each with an average capacity of handling one customer
in 15 minutes (the service times are exponentially distributed). In this
2. Queueing theory 13

system there is room for four customers in all (including those being
served). If a customer arrives at station II and finds it full, this customer
leaves and will not return. Compute the probability distribution for the
number of customers at station II.
(c) Compute the average waiting time for a customer at station II.

2.9 To a queueing system, consisting of the three stations I, II and III, there arrive
customers of two categories, namely A- and B-customers.
The A-customers arrive at station I with an average time between arrivals of
six minutes. Station I is manned by a single server, with an average capacity
of 25 customers every hour. After being served at station I all the customers
are allocated to station II.
Station II also consists of a single server, with an average service time of 2 min
30 s. Customers who have been served in station II are with probability 1/2
allocated to station I and with probability 1/2 allocated to station III.
To station III, which also has one server with capacity 22 customers per hour,
there arrive, in addition to the customers from station II, customers of type B.
The latter arrive with an intensity of 10 customers every hour and they have
total priority over customers of type A.
Compute the average waiting time in the system for a customer of type A.

2.10 To Frasse’s kiosk there arrive, on average, 50 people every hour. If the kiosk
queue is empty on average 80% of the people stop to buy a hot dog. If, on the
other hand, one or more people are waiting in line this social gathering looks
so nice that on average 90% of the passing customers buy a hot dog.
In the kiosk Frasse himself works together with his brother-in-law Heathcliff.
Each of F & H manages to serve a customer in 2.4 minutes (=2 min 24 s).
(Assume Poisson arrivals and exponentially distributed service times.)

(a) Compute the probability distribution for the number of customers in the
system.
(b) Compute the average number of customers in the system.
(c) Compute the average waiting time in the system for a randomly selected
customer (queueing time + service time).
(d) How many minutes per hour is there a line at the kiosk (on average)?

2.11 Consider an M/M/K/K-system with arrival intensity λ and service intensity


µ (for each server).

(a) Compute the probability distribution Pn , n = 1, . . . where Pn = P (N = n).


(b) When K → ∞ the distribution for N converges to a well known limit
distribution. Find this distribution.

2.12 Consider the following system


2. Queueing theory 14

Station 1 has a server with service intensity µ (the service time is exponentially
distributed). The waiting room is infinitely vast. Station 2 is a replica of
station 1. Let N1 and N2 be the number of customers at each the stations.
Customers arrive to the system according to a Poisson process with intensity
λ. If a customer arrives and N1 > N2 he will be sent to station 2. If N1 < N2
he will be sent to station 1. If N1 = N2 the customer will be sent to station 1
with probability 0.5 and to station 2 with probability 0.5.

(a) Plot the rate diagrams for the aggregated system.


(b) Assume that λ = µ = 1. Assume further that the conditions above hold
with one exception: there is room for exactly one customer (except the
one being served) at each of the stations. Compute the steady state
distribution and compute the number of customers per unit time that are
served by each station.

2.13 At a playground there is a swing and a slider. To the playground children arrive
with an exponentially distributed inter-arrival time, with expected value 12
minutes. Of the arriving children, 40% go to the swing and 60% to the slider.
The time on the swing is exponentially distributed with expected value 4
minutes. The time on the slider is also exponentially distributed with expected
value 30 seconds. The swing as well as the slider has room for only one child
at the time.
After having used the swing a child goes to the slider with probability 0.9 or
leaves the playground with probability 0.1. After having used the slider a child
has one more go at it with probability 0.7, goes to the swing with probability
0.2 or leaves the playground with probability 0.1.

(a) Represent this problem as a Jackson-network.


(b) Compute the expected times in queue for the swing and the slider.
(c) Compute the expected time a parent has to wait while the child plays at
the playground.
(d) Compute the expected time a parent has to wait while the child plays at
the playground, if the child first goes to the swing.

2.14 Two repairmen (A and B) are responsible for the maintenance of four machines
(M1 , M2 , M3 and M4 ) that break down now and then. For each individual
the mean time to failure is exponentially distributed with an expected value
of 30 minutes.
One can choose between three different strategies for how the repairmen should
perform their tasks.
2. Queueing theory 15

I. The two repairmen are together responsible for the maintenance of the
four machines. However, they do not work together on the same broken
machine. If only one machines is broken one of the two is thus without
task. The time to repair a machine is exponentially distributed with an
expected value of 20 minutes.
II. Repairman A is responsible for M1 and M2 while B is responsible for
M3 and M4 . Even if both M1 and M2 are broken while M3 and M4 are
running B will not assist A and vice versa. Due to some specialization
benefits the mean down time is now 18 minutes (exponentially distributed
down time).
• The two repairmen work together on each broken machine. Thus they
behave at all times as one “super repairman”. The mean of the exponen-
tially distributed down time is now 12 minutes.

Draw a rate diagram for each of the three cases and compute how many ma-
chines that, on average, are running in each case.
3. Inventory theory 16

3. Inventory theory

3.1 Benny sells cars. For the luxury model Porsche 911 Carrera he charges 400
kkr (400.000 kronor) per car. Benny buys the cars from Ronny Retail for 300
kkr per car. At the start of each month Benny may order a desired number of
Porsches to the above mentioned price with immediate delivery. The delivery
cost is paid by Ronny.
Benny expects the demand during a month to be triangularly distributed in
the interval [10, 20]. Further, Benny can store unsold cars to the next month.
This will cost him 50 kkr per car in inventory cost and capital cost. Finally,
Benny expects a goodwill loss of 100 kkr for each Porsche customer that can
not be provided with a car.
Compute the optimal number of Porsche 911 Carreras that Benny should order
from Ronny if he has N cars in stock at the start of the month. (You may do
the computation with a continuous distribution of the demand and thereafter
round to the nearest integer.) A triangular distribution on the interval [10, 20]
has the following density

x − 10, 10 ≤ x ≤ 15
1 
f (x) = 20 − x, 15 ≤ x ≤ 20
25 
0 otherwise.

3.2 Cheapskate-Charlie owns a bacon-shop. After many years in the business with
faithful customers Charlie knows exactly how much bacon that is sold every
weekday for a week (the weights below are given in hg).
Mo Tue Wed Thu Fri
500 300 200 500 700
Due to the risk of bacon burglars Charlie does not keep any bacon in stock
over the weekend. However, he has the opportunity to store bacon from one
weekday to another, but this will cost him 1 kr/hg and night. Early every
morning (Mo-Fri) Charlie may order bacon that will be delivered to the store
before it opens. The cost for this is 2 kr/hg plus a setup cost of 1000 kr.

(a) Use your knowledge of optimization to help Charlie decide how much
bacon he should order each morning (Mo-Fri) in order to minimize his
weekly cost.
(b) At present, Charlie orders bacon only on Monday mornings. How much
money will Charlie make by hiring you?

3.3 Every Monday morning there is a sale at Cheapskate-Charlie’s bacon-shop.


The Meathead family therefore buy, every Monday morning, a substantial
amount of bacon in order to cover the family’s consumption for a week. Since
bacon from Cheapskate-Charlie’s store only stays fresh for a week, all bacon
that is older than one week must be thrown away on Sunday night, and thus
the fridge is empty of bacon every Monday morning. (The Meatheads lack,
mind you, a freezer.)
Assume that the Meatheads buy y hg of bacon every Monday morning (y ≥ 0)
and that the cost for this is cy kronor (c > 0).
3. Inventory theory 17

Assume further that the Meatheads demand for bacon during a week, D, is a
uniformly distributed random variable on the interval [0, a] (measured in hg),
where a > 0, i.e. D ∈ U[0, a].
The cost of throwing away excess bacon is negligible. If y ≥ a there is never
any shortage of bacon and therefore no shortage cost. If, however, y < a there
is a risk of shortage of bacon during the week. In this case the shortage cost
is given by

b · (D − y)2 , when D ≥ y; (b > 0)
B(D, y) =
0, otherwise.

(a) Compute T C(y), the expected total cost for a week, for y ≥ 0.
(b) Show that T C(y) is a convex function for y ≥ 0.
(c) Assume that ab > c. Show that T C(y) achieves its minimum in the open
interval (0, a).
(d) Compute how much bacon the Meathead family should buy every week
in order to minimize the expected total cost T C(y). In their case a =
100 hg, b = 0.5 kr/hg and c = 8 kr/hg.

3.4 Two weeks before the water-festival Frasse has decided to make a little profit
by selling the Whisky-like product “Thunderball”, produced by his brother-
in-law, Heathcliff. F has already got a supply of 100 litres in the cellar of his
(illegal) night club “Rochester Arms” and he his now wondering how much
more he should order from H. During the festival F calculates that he will, due
to the festive minds of the participants, be able to sell Thunderball for 100 kr
per litre, but the quantity that will be left over after the end of the festival
has to be considered impossible to sell, due to the poor quality of the product,
and is therefore worthless.
If Frasse places an order two weeks before the start of the festival Heathcliff is
willing to let him buy Thunderball for 40 kr/litre. During the festival Heathcliff
can satisfy Frasse’s need for Thunderball, but now at the price of 80 kr/litre.
What makes life so problematic for Frasse is that he has no a priori knowledge
of the demand for Thunderball during the festival. He therefore considers it to
be a random variable Z with distribution function F (z) and density function
f (z).

(a) Let x be the number of litres that Frasse orders ahead of the festival.
Give an expression for the expected net income as a function of x.
(b) Deduce conditions for optimality if Frasse’s objective is to maximize his
expected net income.
(c) Solve Frasse’s problem if demand is exponentially distributed with ex-
pected value 1000 litres.

3.5 Consider an inventory problem where we have the initial inventory level x. We
study the inventory for a single period of (deterministic) length T .
Our problem is to choose the inventory level y that minimizes the total ex-
pected cost during the period. The cost is composed as follows. Every ordered
3. Inventory theory 18

unit costs c kr. Every unit in stock costs h kr per unit time and every short-
age results in a cost of p kr per unit time. The demand during the period is a
random variable Z and we assume that Z is continuous with density function
f . What distinguishes this model from the ordinary single period model is
that the items in the inventory are assumed to be removed continuously. More
precisely the following happens: If the outcome of Z is z there is a continuous
removal of items in the inventory over [0, T ] so that at time T a total of z
items have been removed.

(a) Deduce the structure of the optimal inventory level.


(b) Solve the problem explicitly for the case when

kx, 0 ≤ x ≤ 10
f (x) =
0, otherwise
T =1, c=0.14, h=0.5, p=0.5, x=3.

3.6 (a) State the Wilson formula in its simplest form. Also state the underlying
assumptions regarding demand and cost structure that are being made.
Use your own notation.
(b) Assume now that the model from (a) is generalized so that shortage is
allowed, but that shortage gives rise to a cost per unit and unit time.
Deduce the optimal order- and shortage levels.
(c) Docent Optimus wants to optimize the allocation of his cash flow. At
the end of each month, when his bills are paid, he has 10 kkr left on his
check account. This money is spent at a constant rate until the next end
of month. The money that is left in the account generates interest on a
day-to-day basis with the interest rate 2%.
Docent Optimus now wonders if it pays to get credit on the check account
and systematically be in debt at the end of the month. For a possible debt
he would have to pay interest rate 17%. The money that is withdrawn
from the check account can be deposited in a savings account and generate
10% interest.
Is it a good idea to be in debt and how much should he in that case be
in debt at the end of the month?

3.7 Consider the inventory of a commodity with a random demand, for which the
time to delivery is l u.t. (units of time) and such that the demand during the
delivery lag has density function φ.
Assume that the following data is known:

Average demand a units/u.t.


Setup cost K kr/order
Holding cost h kr/unit & u.t.
Shortage cost p kr/unit (No separate cost for time of shortage)

Assume that we are considering a (s, S)-policy, i.e. the order quantity is Q =
S − s. For given s and Q give the expected value per unit time for:
3. Inventory theory 19

(a) Setup cost


(b) Holding cost
(c) Shortage cost

(It is permissible to make the same simplifications as in H&L.)


By using the expression for the total cost per unit time,

(d) Deduce the optimal Q, for a given s.


(e) Deduce a condition for an optimal s, given Q.

3.8 The navy has to (due to an unexpected number of shipwrecks) order new
transmitter tubes to the radar units in their minesweepers. At present there
are only r tubes left.
If one orders x tubes at once the price will be K + cx kkr.
Since the minesweeper series is on its way out of the navy one wants to order
enough tubes for the remainder of the life time of the sweepers. It has been
estimated that the need for transmitter tubes for the remainder of the life time
is exponentially distributed with average 50, i.e. the density is
1 −ξ/50
f (ξ) = e for ξ ≥ 0.
50

If demand turns out to be higher than the ordered quantity one must order
extra tubes at a cost of 300 kkr/unit.
Those tubes that are left when the minesweepers are retired from service are
worthless (i.e. the revenue from selling the parts is the same as the cost of
dismounting the tubes).

(a) Give the expected total cost as a function of the number of tubes ordered.
(b) Deduce conditions for the optimal number of tubes that should be or-
dered.
(c) Calculate the optimal number of tubes if K = 500, c = 25 kkr and r = 10.
(d) Assume now that K = 0, c = 50 kkr and r = 10, but that a discount is
given if more than 20 tubes are ordered. In this case the full price is paid
for the first 20 tubes, but the extra tubes cost 25 kkr. How many tubes
should be ordered?

3.9 Consider the following inventory model for a single period, for which demand,
D, is exponentially distributed with expected value µ. For every sold unit one
gets revenue q kr. Every purchased item costs c kr. If shortages occur the cost
is p kr (no matter how large the shortage is).

(a) Assume that there already are x units available in stock and that one
orders up to inventory level y. Give an expression for the expected profit
and state the optimal policy.
(b) Assume now that there is also a setup cost of K kr for every order. Give
the optimal policy.
3. Inventory theory 20

3.10 Assume that we have a stochastic inventory model for a single period with a
setup cost and with integral demand and inventory level.
At present, the inventory level is x units (where x is an integer), the setup cost
is K kr per order and the production cost is c kr per unit, the inventory cost is
h kr per unit that is in the inventory at the end of the period and the shortage
cost is p kr per unit of shortage. For the demand, D, we know the probability
that demand is d units for every non-negative integer d . This probability is
denoted PD (d). Thus we also know the distribution function FD according to
b
X
FD (b) = PD (d).
d=0

Deduce the optimal ordering policy in this case.


Hint: Write C(y) as C(y) = cy + L(y) where

X y
X
L(y) = p (d − y)PD (d) + h (y − d)PD (d).
d=y d=0

Consider the difference C(i + 1) − C(i) where i is a non-negative integer. Show


from this that if y 0 is the minimizing integer for C(y) then y 0 is the smallest
integer that satisfies the following inequality
p−c
FD (y 0 ) ≥ .
p+h
Then, use the same kind of reasoning in the continuous case.

3.11 Consider the following single period problem, where the demand D is expo-
nentially distributed with expected value 1. The production cost is c kr/unit.
What distinguishes this model from the standard model is that the shortage
cost is a fixed cost p kr if D > y while the shortage cost is zero if D ≤ y, where
y as usual is the inventory level after order. There is no inventory cost in this
model. Note that the shortage cost is not a cost per unit in shortage.

(a) Assume that the inventory level is x units at the start of the period.
Deduce and describe the optimal ordering policy, in as much detail as
possible.
(b) Assume that there is a setup cost of K kr. Deduce the optimal policy in
this case.
4. Marginal Allocation 21

4. Marginal Allocation
Px k 2
4.1 Let fk (xk ) = 10k − 3xk and gk (xk ) = k ℓ=1 ℓ for k = 1, 2, 3 and xk =
1, 2, 3, · · · .
Define f (x) = f1 (x1 ) + f2 (x2 ) + f3 (x3 ) and g(x) = g1 (x1 ) + g2 (x2 ) + g3 (x3 ),
where x = (x1 , x2 , x3 ).

(a) Does f and g satisfy the conditions necessary to use Marginal allocation.
(b) Determine all efficient points x for the simultaneous minimization of f
and g such that x1 + x2 + x3 ≤ 8.
What is the optimal solution to
 
minimize f (x)
 s.t. g(x) ≤ 27 
xk ∈ {1, 2, 3, · · · }, k = 1, 2, 3.

4.2 A consulting company ABC should do four different jobs for an important
customer. ABC has the capacity to allocate in total 11 consultants to these
four jobs. The jobs should be carried out on four different places, remote from
each other, so no consultant can be assigned to more than one job. Further, at
least one consultant must be assigned to each job, which leaves 7 consultants
to be allocated. If sj of these 7 consultants are assigned to job j (in addition to
the compulsory consultant already assigned) then the time tj it takes to carry
out job j can be approximated by tj = cj /(sj + 1), where cj is the time it takes
for one single consultant to do the job. The measure used by this particular
customer to evaluate the work done by ABC is T = t1 +t2 +t3 +t4 . The smaller
T , the more satisfied customer. Data: c1 = 18, c2 = 30, c3 = 48, c4 = 66
days. Help ABC to allocate the 7 additional consultants in such a way that T
becomes as small as possible. Motivate carefully how you can guarantee that
your solution is optimal!

4.3 Frasse has been promoted to chief of a small research group. The group will
perform three different experiments and depending on how the recruitment
goes he will have between four and eight (equally talented and equally payed)
researchers available for performing the experiments.
Frasse has estimated the probability (in per cent) that experiment i will fail if
n researchers are assigned to the experiment to be given by the function pi (n)
tabulated below.

n p1 (n) p2 (n) p3 (n)


1 20 30 40
2 15 20 25
3 13 16 20
4 11 12 17
5 10 10 15

Assume that there is at least one resercher on every experiment.


4. Marginal Allocation 22

(a) Is p(n1 , n2 , n3 ) = p1 (n1 ) + p2 (n2 ) + p3 (n3 ) a separable integer convex


function?
(b) Determine how many researchers should be on each project if Frasse aims
at minimizing the sum of the risks of failure for the individual experiments
for the the five cases of: 4,5,6,7,8 total researchers available.
If there are more than one optimal solution for some fixed number of total
researchers, determine all such optimal solutions.

4.4 In a satellite, planned to be launched into space for a one year mission, a
certain measurement system relies on n different types of instruments which
all must work in order for the whole system to work. If an instrument fails,
it can not be repaired during the mission. In order to increase the reliability
of the system, the instruments can be duplicated, but then the weight of the
satellite increases. Assume that an instrument of type k fails with probability
pk during the mission, regardless of if it is in use or in stand-by, and that failures
of instruments happen independent of each other. If xk +1 instruments of type
k is brought to the mission, where xk ∈ {0, 1, 2, ...}, then the probability that
at least one of the instruments of type k will be working during the whole
mission is (1 − pxk k +1 ), and then the probability that the measurement system
will be working during the whole mission is given by the product
n
Y
(1 − pxk k +1 ).
k=1
To get a separable function, we take the logarithm of this product, and we also
make a sign change (due to our habit of prefering minimization to maximiza-
tion) and consider the function
n
X
f (x) = − log(1 − pxk k +1 ).
k=1
The total weight of the instruments brought to the satellite is given by the
function
n
X
g(x) = wk (xk + 1),
k=1
where wk is the given weight of each instrument of type k. One would like both
of these functions f (x) and g(x) to be “small”, but there is an obvious conflict
here. Therefore, one should chose one of the efficient solutions corresponding
to the pair (f, g). Your task is to calculate all efficient solutions with xk ≥ 0
for all k and with a total weight ≤ W max , for the following data:
n = 3, W max = 15, (w1 , w2 , w3 ) = (3, 1, 2), (p1 , p2 , p3 ) = (0.3, 0.1, 0.2).
Since you are not allowed to use any calculator, you should use the approxi-
mation log(1 − z) ≈ −z for “small” z (obtained e.g. by a Taylor expansion)
and thus consider the function
n
X
f (x) = pxk k +1 ,
k=1
Pn
instead of the function f (x) = k=1 − log(1 − pxk k +1 ).
5. Dynamic programming 23

5. Dynamic programming

5.1. Deterministic Dynamic Programming


5.1 A certain controlled system is described by the following discrete time model:

xk+1 = 2xk − uk ,

where xk is the state of the system at time k, while uk is the control at time
k. (k is always and integer, while xk and uk are real numbers.)
The system is in a certain given state at time 0, that is x0 = 85, and we want
to control the system so that x4 = 0. We want to do this in the least expensive
way, where the total cost is given by u20 + u21 + u22 + u23 .
The problem should be solved using dynamic programming, and it is advised
to define the following function:
Vk (xk ) = the minimal remaining cost from time k, given that the system then
is in state xk .

(a) Determine V3 (x3 ) and a recursive equation for Vk (xk ).


(b) It is reasonable to make the assumption that Vk (xk ) = ck x2k , where ck is
a constant for each time k. Verify that this assumption holds using the
recursive equation determined above and derive a recursion formula for
the constants ck . Finally, determine the optimal controls uk , the resulting
states xk , and the minimal total cost.

5.2 A company wants to optimize their production and inventory for a 6-weeks
period.
The following things must hold:
At the end of each week (friday afternoon) the company delivers 1 unit of a
certain product.
During each week, the company can produce either 0 units to the cost of 0
SEK, or 1 unit to the cost of c1 SEK, or 2 units to the cost c2 SEK (for the
two units together).
The company can store at most 1 unit from one week to the next. The cost
for this is q SEK per unit and week. This cost has to be payed each friday
evening when there is a unit in the inventory (after the delivery of the week).
At the start of week 1 the storage room is empty and it should be empty also
after the delivery at the end of week 6.
The question is now how the company should act in order to minimize its total
costs for manufacturing and inventory, given that the delivery of 1 unit each
friday is carried out.
This problem can be solved using dynamic programming.

(a) Let vk (sk ) = be the minimal remaining cost when k weeks remains of the
period and the inventory at monday morning is sk units.
Determine v1 (s1 ) and a recursive equation for vk (sk ).
5. Dynamic programming 24

(b) Use the recursive equation to solves the problem when c2 + q < 2c1 .
(c) Use the recursive equation to solve the problem when c2 + q > 2c1 .

5.3 Goran has just won 1 miljon SEK at Bingo-Lotto and taken a leave of absence
for the next year from his secure but boring job. During this year Goran has
decided to spend his winnings in an optimal way.
Goran knows that if he during one month consumes for u SEK this will grant

him a satisfaction of c u units of pleasure, where c is a positive constant. (The
utility function of Goran has a decreasing derivative since a certain grade of
saturation occurs at large consumption).
The money Goran has won are placed in a party account that gives interest as
follows: At the end of each month Goran receives an interest that is calculated
as 1% of the smallest amount located in the account during the month (i.e.
1% of the money in the account at the end of the month in our case). This
interest is placed in the account at the turn of the month.
P12 √
Goran wants to maximize his total utility during the year, i.e. k=1 c uk ,
but he do not know at what rate he should spend the money. However, he is
sure that he do not want to have an money left when the year is over (when
he will return to his ordinary dull life).
Your assignment is to use dynamic programming to determine an optimal
consumption strategi for Goran.
Let Vk (xk ) = maximum remaining pleasure for Goran given that it remains
k month of the year and that there are xk SEK left on the party account
(including any interest gained).

(a) Determine V1 (x1 ) and a recursive equation for Vk (xk ).


(b) Determine, using the recursive equation, how much money Goran should
use for consumption each month of the considered year.

5.4 Frasse the bum lives a dubious life going in and out prison.
When he is in prison he has two options: He can refrain from criminal activity
and with the probability 0.8 he will be let out for his good behavior the next
month. The downside is that he does not get any money that month. The
other option is to bootleg lightbeer and thus earn 2 kSEK, but also decreasing
the chance of getting let out to a probability of 0.5.
When Frasse is free he resorts to criminal activities again. Either he blows
up safes or he swipes handbags. The profit for one month of these activities
are 12 kSEK and 4 kSEK respectively. When Frasse is blowing up safes the
probability is 0.8 that he gets caught and is improsoned and when he swipes
handbags the probability is 0.2.
Help Frasse to maximize the expected earnings during the next three months
assuming that he starts in prison and that he does not care if he is in prison
or not after the three months have passed.
Note: All state changes occurs at the turn of the month and the choice of
activity is fixed for the whole month.
5. Dynamic programming 25

5.5 This exercise deals with the knapsack problem.


Given n different objects, numbered 1, 2, . . . , n. Object number j has the given
value cj SEK and the given weight aj kg. The objective is to fill a knapsack
with a subset of the given objects in such a way that the value of the contents in
the knapsack is maximized and the constraint that the weight of the contents
does not exceed b kg.
P
a1 , . . . , an , c1 , . . . , cn and b are given positive integers. (If b ≥ aj the
Pproblem
is trivial, since all elements can be put in the knapsack, but if b < aj , and
n is large, the problem can be hard.)
This problem can be solved using dynamic programming. Define the optimal-
value-function vk (s), for s = 0, 1, . . . , b and k = 1, 2, . . . , n , as the optimal
value of a reduced knapsack problem where it is only possible to choose among
the k first of the given n objects (i.e. the objects with numbers 1, · · · , k) and
where the contents of the knapsack can not exceed s kg. The optimal value of
our original problem is therefore = vn (b).

(a) Determine v1 (s) and a recursive equation for vk (s).


(b) Illustrate the use of the recursive equation by solving a small problem
with the following data:
n = 4, c1 = 4, c2 = 7, c3 = 6, c4 = 8, a1 = 1, a2 = 2, a3 = 2, a4 = 3 and
b = 4.
In particular, determine which objects that should be put in the knapsack.

5.2. Stochastic Dynamic Programming

5.6 Honest Harry’s Metal & Junk have received a giant order - a new reactor
tank (called Spare part by the Swedish Nuclear Power Inspection) to Osquar
1, called O1. The strict quality requirements makes it necessary for Harry to
manufacture more than one tank in order to get a specimen that is accepted.
If Harry decides, in a given production run, to manufacture L reactor tanks
the number of acceptable tanks will, since the probability for success for any
given casting is 1/2, be given by a binomial distribution with parameter 1/2;
in particular, the probability that no tank is acceptable is (1/2)L .
The marginal production cost for reactor tanks is 100 million kr per tank (also
for those that are being rejected) and excess tanks are worthless. In addition
there is a setup cost of 300 million for each production run. Harry does not
have time for more than three production runs. If he has not been able to
produce an acceptable tank by the end of the third run he will have to pay
a penalty of 1600 million kronor (somebody has to pay the cost of nuclear
power).
Determine the policy that minimizes Harry’s total expected cost (i.e. determine
Ln for n = 1, 2, 3).
Does Harry have reason to regret accepting the offer of 700 million from the
Osquar Group?
5. Dynamic programming 26

5.7 A controlled random walk is performed on a grid of points, a part of which can
be seen in the figure. Only integer points with t ≥ 0, n ≥ 0 are of relevance.

Gets stuck

Goal
t
Gets stuck

A successful random walk: A walk that reaches the origin.


The end of the walk: A random walk that reaches a positive integer point on
the t- or the n-axis, stays there.
Rules for movement: One can only move to the right in the figure (t decreases
by one, n is unchanged) or diagonally downwards to the right (both t and n
decrease by one).
The probability of horizontal movement may at each point be varied between
a and 1 − a (0.5 < a < 1). The probability of diagonal movement is one minus
that of the horizontal movement.
Let the probability for movement in the horizontal direction at the point (t, n)
be given by pt,n . It thus holds that 1 − a ≤ pt,n ≤ a.
An interesting but for the exam somewhat intricate problem is to decide how
to choose pt,n , for all t and n, so that the probability of a successful random
walk, regardless of starting point, is maximized.
It is, however, desired that the following simple cases be treated:

(a) Derive a recursive equation for Vt,n , i.e. the probability of a successful
random walk starting at the point (t, n) when the optimal p:s are chosen.
(b) Calculate Vt,n and the optimal pt,n for the following integer points.
i. The origin.
ii. On the positive n-axis.
iii. On the positive t-axis.
iv. In the triangle above the 45◦ line through the origin (n > t > 0).
v. On the 45◦ line through the origin (n = t > 0).
(c) Solve the recursion equation for the points t = n + 1 (n ≥ 1, for at least
n = 1, . . . , 4, preferably for all n) and give the corresponding optimal pt,n .
5. Dynamic programming 27

(d) Using the answer to question (c), decide whether the following statement
is true or not: It is optimal to maximize the probability for horizontal
movement, except for the points on the 45◦ line through the origin.

5.8 Assume that you are given N independent equally distributed random vari-
ables X1 , . . . , Xn , all with the density function f , distribution function F and
expected value m. Consider now the following game.

(i) The game is played by one person (the gambler).


(ii) The gambler has an a priori knowledge of f , F (and thus also m) and the
number of variables N .
(iii) The gambler gets to observe the outcome of the random variables in con-
secutive order, i.e. he observes the outcome of X1 first, then the outcome
of X2 etc.
(iv) The gambler has access to a red button which he may press exactly once
during the game.
(v) If the gambler presses the button after he has observed that Xn = x, but
before he has observed Xn+1 the game is stopped and the gambler wins
G(n, x) kronor.
(vi) He has to stop no later than after the outcome of XN (where, naturally,
G(N, XN ) is won).
(vii) The moment when one wishes to press the button need not be decided
before the start of the game; instead the possible decision of stopping the
game after the observation of Xn can be based on the outcome of Xn .
(viii) The objective for the gambler is to maximize his expected winnings.

Your task is now the following.

(a) Let V (n, x) be the optimal expected winning for a game that is not yet
stopped and where you have just observed that Xn = x. Deduce a DynP-
equation (including boundary conditions) for the functions V (n, x), n =
1, 2, . . ..
(b) State the structure of the optimal stopping rule, in terms of the functions
G, f and V .
(c) Define cn ≡ E[Vn (Xn )]. Use the DynP equation in order to derive a
recursion formula for the constants cn , n = N, N − 1, . . . , 1.
(d) Assume that the variables are uniformly distributed on [0, 1], that G(n, x) =
x/n and that N = 4. For what outcomes of X2 is it optimal to stop?

5.9 Consider the following game.

(i) First, you are given x kronor .


(ii) In each round of the game you may choose how large your bet, u, should
be. The bets are limited to integer values and the minimum bet is 1 krona
for each round of the game. You are not allowed to be in debt, i.e. if you
have xn kronor at the beginning of round n and the bet in this round is
un it holds that un ∈ {1, . . . , xn }
5. Dynamic programming 28

(iii) If, at the beginning of round n, you have xn kronor and you choose to
bet un kronor, you will at the beginning of the next round have a random
number of kronor Xn+1 where
Xn+1 = xn + un Zn ,
and where Zn is a random variable with P (Zn = 1) = p and P (Zn =
−1) = 1 − p = q. In other words, the game is a game of “double or
nothing”. The game is said to be favourable if p > q and unfavourable if
p < q.
(iv) The game is ended when you have acquired a predetermined amount of
money N or if you have lost all your money.
(v) The objective of the game is to become a wealthy person, i.e. that your
wealth should be at least N kronor when the game ends. The objective
is thus to maximize P (XT ≥ N ), where T = inf{n|xn = 0 or xn = N }.
The task is now to prove the following theorem.
Theorem 5.1. Assume that the game is favourable, i.e. p > q. Then it
is optimal to play “cautiously”, i.e. u
bn = 1.

5.10 Frasse the bum has through diligent work of dubious nature come into pos-
session of A grams of goldsand. Frasse has a well-founded feeling that within
N days he has to convert the gold to money. A problem with this is that the
price of gold sand is a stochastic process with the dynamics pn+1 = pn ωn ,
where ω0 , . . . , ωN −1 are independent, equally distributed positive random vari-
ables. Further, they are independent of p0 . In this case pn is interpreted as the
price (in kr/kg) of gold sand at day n. The decision strategy is the following.

(i) In the morning of day n Frasse may observe how much gold he has left
and also the price of gold on the present day (i.e. the outcome of pn ).
(ii) On the basis of this information Frasse decides how much gold sand (de-
noted by cn ) he wants to sell this day.
(iii) The funds acquired in this fashion are in the evening deposited in a savings
and loans bank located near Frasse’s home. The daily interest rate at the
bank is β, where β > 1. This means that for every krona deposited in
the evening, Frasse will have β kr the next morning.

Help Frasse to find the policy for selling gold that maximizes his expected
wealth on day N by solving the following scheme.

(a) Let xn be the remaining amount of gold sand at the beginning of day n.
Give an expression for the x-process, an explicit expression for Frasse’s
objective function, define a sound optimal value function Vn (x) and derive
the DynP equation for the problem.
(b) Solve the problem.

Remark: Frasse knows β and the distribution of ωn . In particular he knows


m = E[ωn ]. The optimal policy depends crucially on the relation between β
and m. Your job is to compute the optimal policy for every possible combina-
tion of β and m.
5. Dynamic programming 29

5.11 A stochastic system can be in any of the states k = 0, 1, 2, . . .. If at time n


the system is in state k (which will be denoted xn = k), you may influence
the state of the system at time n + 1 by choosing a real number un , subject to
the condition that un has to be positive. Given un = u the dynamics of the
system is given by

xn+1 = xn + Zn ,
where Zn is a random variable with the distribution

um
P (Zn = m) = e−u , m = 0, 1, 2, . . .
m!
(Zn is thus given by a Poisson distribution with the expected value u). The
choice of un is based on the current state and the problem is to choose a
sequence of controls u0 , u1 , . . . , uT −1 that maximizes
" T −1
#
1X 2
E xT − u ,
2 n=0 n

where T is the predetermined time horizon.

(a) Define a reasonable optimal value function and derive the DynP equation
(including boundary conditions) for the given problem.
(b) Solve the problem, i.e. compute the optimal value function and the opti-
mal control sequence explicitly.

5.12 The famous sea captain Frans Ali Baba, a.k.a. Frasse, has for a long period
been assisting the sultan in a number of important matters. Among other
things he has enriched the state’s treasury by impounding sea vessels belonging
to unfriendly nations that have happened to find their way to Frasse’s waters.
As a reward Frasse may choose a horse from the sultan’s stables. Frasse has,
of course, never seen these horses. Being greedy, however, he would very much
like the most beautiful.
The problem is that the sultan lets each horse parade past Frasse one by one
and Frasse needs to decide at once (for each horse Frasse must say whether he
wants the horse or not). Frasse knows that there are N horses in the stable
and he may assume that the parade order is completely random.
Help Frasse decide which strategy will maximize his chances of choosing the
most beautiful horse.
6. Markov Decision Processes 30

6. Markov Decision Processes

6.1 Let X be a continuous time Markov chain, where X is in the state space
{1, 2, 3, . . . , M }. We assume that we can influence the dynamics of X by
(scalar) control actions. This is modelled as follows: if, at time t, we are using
the control ut = u, then X has the intensity matrix H(u), where the com-
ponents of H(u) are called λi,j (u) and are, as usual, interpreted as transition
intensities, i.e.

P (Xt+∆t = j | Xt = i) = λi,j (u)∆t + o(∆t), i 6= j.

Thus, we may by different choices of the control influence the transition inten-
sities of X. The choice of ut may only depend on the time t and on the state
Xt at time t. Further, it is required that ut ∈ B for each t, where B is a fixed
subset of the real axis. Consider now the problem of minimizing the expected
cost Z T 
E g(Xt , ut )dt + G(XT ) ,
0

for a given initial state X0 = m,

(a) Define a sensible optimal value function V (t, n) for this problem.
(b) Heuristically deduce the DynP equation for V . Do this by comparing the
following alternatives for a fixed t:
i. Choose a random control u in the interval [t, t + ∆t] and use an
optimal control in the future.
ii. Use an optimal control on the interval [t, T ].
To keep the computations reasonably simple you may neglect all suffi-
ciently small terms.

6.2 Much to the dislike of her beloved Fluke, Luckybet plays poker every Saturday
night. If Luckybet takes her husband out to dinner on Saturday night (at an
expected cost of 400 kr) before she goes out to play poker, Fluke will be in
a good mood on the following Saturday with probability 7/8 and in a bad
mood with probability 1/8. If Fluke is in a bad mood on Saturday night and
Luckybet did not buy him dinner the previous weekend, Fluke buys a suit for
2000 kr. Use the policy improvement algorithm to decide an optimal policy
for Luckybet if she wishes to minimize her expected average cost per week.
The initial policy is to always buy him dinner.

6.3 Consider a factory in which one uses a certain type of machine. The machine
can, at the start of a given week, be in different states. We represent these
states with one of the numbers 1 (excellent), 2 (good), 3 (decent) or 4 (bad).
The revenue for a given week is dependent on the state of the machine accord-
ing to the following scheme: Excellent, 100 kr; good, 80 kr; decent, 50 kr; bad,
10 kr.
At the start of each week we can observe the state of the machine and may
thereafter decide to replace it (u = 1) or let it be (u = 0). To replace the
6. Markov Decision Processes 31

machine costs 200 kr independently of the state of the machine. The result is
that one immediately obtains a machine that is in the excellent state.
As time goes by the machines are worn and get worse. In particular, the
following transition probabilities hold for a not replaced machine, i.e. given
u = 0.
An excellent machine will, with probability 0.7, be excellent also next week,
or, with probability 0.3, be a good machine. A good machine will, with prob-
ability 0.7, be good also next week, and it will, with probability 0.3, be a
decent machine. A decent machine continues to be decent the next week with
probability 0.6 and will, with probability 0.4, be bad the next week. A bad
machine stays bad for all eternity (unless it is replaced).
The question is which strategy that should be chosen in order to maximize the
expected average income per unit time. Your tasks are as follows.

(a) Give the transition matrices P (0) and P (1).


(b) Give the structure of the cost, i.e. determine the constants cik . In this
case it may be simpler to treat the problem as a maximum profit problem
rather than a minimum cost problem. In that case cik naturally stands
for immediate revenue rather than immediate cost. In your solution you
should clearly state if you are using costs or revenue.
(c) Use the policy improvement algorithm to determine the policy for repair
that maximizes the expected average revenue per unit time. Start with
the initial policy u(1) = 0, u(2) = 0, u(3) = 1, u(4) = 1. Further, let
v2 = 0 in the first value determination step. The optimal average revenue
per unit time should also be given in your answer.

A comforting fact: The algorithm will converge very fast in this case.
6.4 The professional tennis player Martin Heidegger runs his business on a yearly
basis. A given season can be classified as either “good” or “bad”. The decision
Martin is faced with before the start of each season is if he should practice
hard or practice moderately. If M practices hard the probability that he is in
good shape the following season increases, but on the other hand he will not
have time to play some well paid celebrity matches.
If the previous season was good and M practices hard, the next season will
also be good with probability 0.8. If he practices moderately the next season
will be bad with probability 0.5. If the previous season was bad and Martin
practices moderately, the next season will be bad with probability 0.6. If he
practices hard, the next season will be good with probability 0.7. Let a good
(bad) season be denoted by 1 (2). Let further the decision of moderate (hard)
training be denoted 1 (2). The income that Martin earns is given in the table
below. Ri,j (k) denotes the immediate revenue for a transition from state i to
state j for the decision k. The income is given in the not so well known unit
Pzt, which is the currency in the relatively unknown state in central Europe
in which Heidegger was raised.
   
9 3 4 4
R(1) = , R(2) =
3 −7 1 −19
6. Markov Decision Processes 32

Easy calculations show that in the short term the decision k = 2 is dominated
by k = 1. Martin, who is a bit speculative, wonders if this implies that it
is optimal also in the long run to always make the decision k = 1. Your
mission is to help young Mr Heidegger with this problem by determining the
training policy (with infinite time horizon) that maximizes Martin’s expected
revenue per year. Note that it is a maximization problem. You may either
modify the standard algorithm to work for maximizing or restate the problem
as a minimization problem. Solve the problem with the policy improvement
algorithm starting with the initial policy (1, 1).

6.5 A certain system, used on a daily basis, can be in two different states: “Broken”
or “Fine”.
Every night a repairman checks the system and (if necessary) repairs it so that
it is working the next day.
Two different repairmen can be hired, Anderson and Bengtsson, and both
charge a given fee for their services. Anderson charges 2000 kr if the system
is broken and 500 kr if it is fine. The corresponding fees for Bengtsson, who
is more expensive, is 3500 kr (for a broken system) and 1600 kr (if the system
is fine).
If one hires Anderson the probability that the system will break down during
the next day is 0.3 (regardless of whether the system was broken or fine at
the time of check-up), while the corresponding number is 0.2 for Bengtsson.
If the system is fine in the evening one may completely ignore the check-up
(i.e. employ neither Anderson nor Bengtsson), but then the probability that
the system will break down the next day is 0.4.
It is bad if the system breaks down; in fact, this corresponds to a cost, due to
loss of production, of 10 000 kr (on average).
Presently, one uses the following strategy: Bengtsson is hired if the system is
broken, while Anderson is hired if the system is fine. Use the policy improve-
ment algorithm to determine the strategy that minimizes the expected average
cost.
Use the given strategy as the initial policy.

6.6 A dam is partly used to generate electricity and partly for irrigation. The
dam’s capacity is 3 units of water. The distribution of the amount of water,
Wt , that flows into the dam during month t, t = 1, 2, . . . is given by PW (m),
where

PW (0) = P (W = 0) = 1/6,
PW (1) = P (W = 1) = 1/3,
PW (2) = P (W = 2) = 1/3,
PW (3) = P (W = 3) = 1/6.

In order to generate the contracted amount of electricity one unit of water is


required. At the start of the month the decision about how much water should
be released this month is made. The first unit is used to generate electricity
6. Markov Decision Processes 33

whereas the remaining units are used for irrigation. The latter is worth 100
kkr per unit of water and month. If the dam contains less than one unit of
water at the start of the month extra power must be purchased at a cost of
300 kkr. If the dam at some point in time contains more than 3 units of water
the excess water must be released without any cost or revenue.
Formulate an LP-problem for finding the optimal water release policy. Define
the variables you use and state how the optimal policy is obtained from the
solution to your LP-problem.

6.7 This exercise deals with the discrete time policy improvement algorithm. Sup-
pose we have a system that can be in the states i = 1, . . . , N and that we
can in every state make any of the decisions k = 1, . . . , K. PNAssume further
that the transition probabilities are given by pij (k), where j=1 pij (k) = 1 for
k = 1, . . . , K and i = 1, . . . , N . Moreover, assume that the expected cost in a
time step for being in state i and making the decision k is Cik kr.
Assume that we have a policy A = (A1 , A2 , . . . , Ai , . . . , AN ), where Ai denotes
the decision in state i, and that we have evaluated this policy and found the
average cost per period gA and so called relative values viA , i = 1, . . . , N .
Assume further that the policy improvement iteration gives a policy B (when
we use viA ). Let g B be the average cost per period for this new policy. Show
that gB ≤ g A .
Hint: First, write the equations that give g A and viA and the equations that
give gB and viB . Show that ∆g = g B − gA and ∆i = viB − viA satisfy a system
of equations on the same form as that which gives gB and viB . Then motivate
why ∆g ≤ 0.

6.8 A dam is partly used to generate electricity and partly for irrigation. The
dam’s capacity is 3 units of water. The distribution of the amount of water,
Wt , that flows into the dam during month t, t = 1, 2, . . . is given by PW (m),
where

PW (0) = P (W = 0) = 1/6,
PW (1) = P (W = 1) = 1/3,
PW (2) = P (W = 2) = 1/3,
PW (3) = P (W = 3) = 1/6.

In order to generate the contracted amount of electricity one unit of water is


required. At the start of the month the decision about how much water should
be released this month is made. The first unit is used to generate electricity
whereas the remaining units are used for irrigation. The latter is worth 100
kkr per unit of water and month. If the dam contains less than one unit of
water at the start of the month extra power must be purchased at a cost of
300 kkr. If the dam at some point in time contains more than 3 units of water
the excess water must be released without any cost or revenue.
Find the optimal water release policy by using the policy improvement al-
gorithm to minimize the discounted costs (discount factor α = 6/7). Start
6. Markov Decision Processes 34

with the policy only to deliver electricity according to the contract, i.e. not
delivering water to the irrigation system.
Hint: The following information may be helpful:
     
6 −2 −2 −1 21 90/23
 −1 5 −2 −1    
0   21/23 
 x= =⇒ x =  
 0 −1 5 −3   0   6/23 
0 0 −1 2 0 3/23
     
6 −2 −2 −1 21 2
 −1 5 −2 −1   0   
 x =   =⇒ x =  −1 
 −1 −2 5 −1   −7   −2 
−1 −2 −2 6 −14 −3

6.9 The car manufacturer Volneault is facing a strategy decision concerning when
and how to invest in advertising and developing a model. The consulting
company the Markovians has been hired to construct a model as a basis for
the decision. The following very crude simplifications have been made.
A model is either a success or a failure. A success generates a profit of 100
Mkr while a failure results in a loss of 20 Mkr. Three different actions can
be taken depending on whether the model is a success or a failure. One can
invest 40 Mkr in advertising or 60 Mkr in development, but one may not (due
to budget restrictions) take both these actions at the same time. The third
possibility Volneault has is to refrain from all efforts to increase the popularity
of the car.
The consultants have, together with Volneault, made the following estimates
of probabilities. If nothing is done for a year and the model is a failure, the
model will be a failure next year also with probability 0.9, while if it is a
success it will be a success with probability 0.6. If advertising is done, and
the model is a failure it will be a failure the next year with probability 0.8,
while if it is a success it will be a success the next year with probability 0.8. If
money is invested in development, and the car is a failure it will be a failure
the next year with probability 0.5, while if it is a success it will be a success
with probability 0.6.

(a) The company wishes to maximize the annual expected revenue. Derive
an optimal policy by using the policy improvement algorithm and start
with the policy of never investing in anything. How large will the annual
expected revenue of the model be if the optimal policy is used?
(b) Formulate the problem from (a) as an LP-problem. In (a) you derived the
optimal policy. Use this information to give the optimal solution to your
LP-problem without solving it with the simplex-method or something
similar.
(c) The company now wishes to maximize the total expected discounted rev-
enue. Use the policy improvement algorithm to find an optimal policy if
the company uses the discount factor 0.8. Start with the policy of never
investing in anything. How large will the expected discounted revenue be
if the optimal policy is used and the model is a success the first year?
7. Solutions to the exercises 35

7. Solutions to the exercises

1.1 a) Since the row sums in a transition matrix are 1, we have that
 
1/3 1/3 1/3
P =  1/2 0 1/2 
1/4 3/4 0

b) We have that
 2  
1/3 1/3 1/3 13/36 13/36 5/18
P(2) = P2 =  1/2 0 1/2  =  7/24 13/24 1/6 
1/4 3/4 0 11/24 1/12 11/24

c) It holds that

 2
1/3 1/3 1/3
p(2) = p(0) P(2) = (1/2, 1/2, 0)  1/2 0 1/2  = (47/144, 65/144, 2/9)
1/4 3/4 0

1.2 a)

P (X5 = b, X6 = r, X7 = b, X8 = b | X4 = w)
= P (X5 = b | X4 = w) · P (X6 = r | X4 = w, X5 = b)
· P (X7 = b | X4 = w, X5 = b, X6 = r) · P (X8 = b | X4 = w, X5 = b, X6 = r, X7 = b)
= (Markov property)
= P (X5 = b | X4 = w)P (X6 = r | X5 = b)P (X7 = b | X6 = r)P (X8 = b | X7 = b)
= (from the matrix) = 0.6 · 0.8 · 1 · 0.2 = 0.096.

b) We get the probabilities for the differents paths at the times 4,5 and 6 from
the table.
Time 4 5 6 Probability
Path y → r → b 0.2 · 1 = 0.2
y → w → w 0.3 · 0.4 = 0.12
y → w → b 0.3 · 0.6 = 0.18
y → y → r 0.5 · 0.2 = 0.1
y → y → w 0.5 · 0.3 = 0.15
y → y → y 0.5 · 0.5 = 0.25
which gives

E(f (X5 )f (X6 ) | X4 = y) = f (r)f (b) · 0.2 + f (w)f (w) · 0.12 + f (w)f (b) · 0.18
+ f (w)f (r) · 0.1 + f (y)f (w) · 0.15 + f (y)f (y) · 0.25 = 14.41
7. Solutions to the exercises 36

(2)
1.3 a) P (X5 = 3 | X3 = 1, X2 = 1) = P (X5 = 3 | X3 = 1) = p13 from the Markov
(2)
property. p13 is determined from the corresponding elements in P (2) = P 2 .
We get  
0.31 0.42 0.27
P2 =  0.28 0.40 0.32 
0.33 0.27 0.4

The asked for probability is thus 0.27.

b) From the Markov property we get

P (X8 = 3, X7 = 1, X5 = 2 | X3 = 2, X2 = 1) = P (X8 = 3, X7 = 1, X5 = 2 | X3 = 2)

But

P (X8 = 3, X7 = 1, X5 = 2 | X3 = 2)
= P (X5 = 2 | X3 = 2)P (X7 = 1 | X5 = 2)P (X8 = 3 | X7 = 1)
(2) (2)
= p22 p21 p13 = 0.40 · 0.28 · 0.5 = 0.056

1.4 The stationary distribution P is determined by the equation system P =


PP, P1 + P2 + P3 = 1.
We get

P1 = 0.4P1 + 0.25P3
P2 = 0.5P2 + 0.75P3
P3 = 0.6P1 + 0.5P2
P1 + P2 + P3 = 1

which has the solution P1 = 1/7, P2 = 18/35, P3 = 12/35.

b) In this case we obtain

P1 = 0.5P1 + 0.25P2
P2 = 0.5P1 + 0.75P2
P3 = P3
P1 + P2 + P3 = 1

This system has infinitely many solutions, P2 = 2P1 , P3 = 1 − 3P1 . Since all
probabilities must be between 0 and 1 it holds that 0 ≤ P1 ≤ 1/3.

1.5 a) The path 1 → 2 → 4 → 3 → 1 has a positive probability, hence all the


states commmunicate, and the chain is irreducibel. The Period is 2.

b) The chain has two irreducible subchains, {1,3} and {2,4}. The states are
aperiodic since you can go from one state and back to the same in one time
step.
7. Solutions to the exercises 37

1.6 a) Feasible values for Xn are 0, 1, 2. We want to show that {Xn ; n ≥ 0}


is a Markov chain. Therefore, we study P (Xn+1 = j | Xn = i, Xn−1 =
in−1 , Xn−2 = in−2 , . . . , X1 = i1 , X0 = 2). If we know that Xn = i it means
that 2 − i balls in the other urn are red (and i of them are green). In the other
urn there are then 2 − i green ones and 3 − (2 − i) = i + 1 red. Hence, we
know the ditribution of balls before the n:th draw. We do not get any more
useful information from the values of Xn−1 , Xn−2 , . . . , X0 , i.e., P (Xn+1 = j |
Xn−1 = in−1 = in−1 , . . . , X0 = 2) = P (Xn+1 = j | Xn = i). This means that
{Xn ; n ≥ 0} is a Markov chain.

b) If Xn is 0 there are 2 green and one red in the urn with three balls. After
the n + 1:th draw this urn holds 2 green with propability 1/3 and one red and
one green with probability 2/3. If Xn is 1 there is 1 green and 2 red balls in
the urn with 3 balls and after the n + 1:th draw the probability is 1/3 that it
has 2 red balls and 2/3 that it has 1 red and 1 green ball. Finally, if Xn is 2
there are 3 red balls in the urn with 3 balls and after the n + 1:th draw the
probability is 1 theat it has 0 green balls. The transition matrix is therefore
given by
 
0 2/3 1/3
P =  1/3 2/3 0 
1 0 0
Initial probability p(0) = (0, 0, 1).

c) {Yn ; n ≥ 0 is not a Markov chain. For example, we have P (Y4 = 2 |


Y3 = 1, Y2 = 1, Y1 = 1) = 2/3, since if the event {Y3 = 1, Y2 = 1, Y1 = 1}
occured, there are 1 red and 2 green balls in the urn holding three balls.
The probability that you draw a green ball is thus 2/3. On the other hand
P (Y4 = 2 | Y3 = 1, Y2 = 2, Y1 = 1) = 1/3 since after the third draw there are
2 red and 1 green ball in the urn with 3 balls.

d) The probabilities for Xn are given by


 n
0 2/3 1/3
p(n) = p(0) P(n) = p(0) Pn = (0, 0, 1)  1/3 2/3 0 
1 0 0

You get p(1) = (1, 0, 0), p(2) = (0, 2/3, 1/3) and p(3) = (5/9, 4/9, 0).

e) The stationary distribution solves the equation system


1
P0 =P1 + P2
3
2 2
P1 = P0 + P1
3 3
1
P2 = P0
3
P0 + P1 + P2 = 1

which has the solution P0 = 0.3, P1 = 0.6, P2 = 0.1.


7. Solutions to the exercises 38

1.7 a) We call the different states 1,2 and 3. The Markov chain {Xn ; n ≥ 1} is
finite and irreducible since all states are accessible from each other state. The
stationary distribution is determined by P = PP, π1 + π2 + π3 = 1 which
gives the equation system

π1 = 0.004π1 + 0.483π2 + 0.238π3


π2 = 0.787π1 + 0.271π2 + 0.762π3
π3 = 0.209π1 + 0.246π2
π1 + π2 + π3 = 1

with the solution π1 = 0.2953, π2 = 0.5160, π3 = 0.1887.

b) Given that a sign is a word separator (state 3) we will determine the prob-
ability that the next sign is a consonant (state 2). But this probability can be
written P (Xn+1 = 2 | Xn = 3) = p32 = 0.762.

c) Given that a sign is a word separator (state 3) we will determine the prob-
ability that the preceding sign is a vowel. But this probability can be written
P (Xn = 1 | Xn+1 = 3). We can not get it directly from the transition matrix,
but it can be determined as

P ({Xn = 1} ∩ {Xn+1 = 3})


P (Xn = 1 | Xn+1 = 3) =
P (Xn+1 = 3)
P (Xn = 1)P (Xn+1 = 3 | Xn = 1) π1 p13
= = = 0.3272
P (Xn+1 = 3) π3

since both Xn and Xn+1 have the distribution π. It follows from that the
initial distribution is the stationary distribution.
Note that the transition probability p13 is not the probability that a word
finishes with a vowel, but the probability that a vowel is the last letter in a
word, i.e., the proportion of the vowels that are last letter in a word. Ths
difference is most obvious if p13 would be 1. Then all vowels are the last letter
in a word, but not all words need to end with a vowel.

d) 1/π3 = 5.30 is the expected number of steps between two word separators,
one of these included. The expected wordlength is thus 5.30 − 1 = 4.30.

e) The expected number of times the Markov chain is in state 1 between two
visits in state 3 is π1 /π3 = 1.565. But this number is the expected number
of vowels in a word. In the same way, the expected number of consonants is
π2 /π3 = 2.735. Note that the sum of these two is the expected word length in
(d).

1.8 a) Given Y2 , Y3 , . . . , Yn−1 , Yn is determined by the last signal and the new
one, hence it is determined by Yn−1 and the new signal. The chain is therefore
Markov. From S1 (the two last signals are zeros) the chain can go to the states
S1 (00) or S2 (01) with the probabilities q and p respectively. In the same way
7. Solutions to the exercises 39

the other transition probabilities are determined to form the transition matrix
 
q p 0 0
 0 0 q p 
P=  q p 0 0 

0 0 q p

b) Now let S4 be absorbing state. Let ti be the expected number of steps until
the Markov process ends up in state S4 when starting in Si . We can consider
the chain as if it starts in S1 . We get the equation system
t1 = 1 + qt1 + pt2
t2 = 1 + qt3
t3 = 1 + qt1 + pt2
We see that t1 = t3 and thus t1 = 1+qt1 +p(1+qt1 ) which using q = 1−p, leads
to t1 = 1/p + 1/p2 and t2 = 1/p2 . Counting the first 0-signal, the expected
number of transmitted signals until two ones in a row are obtain is equal to
1 + t1 = 1 + 1/p + 1/p2 .

1.9 a) Make 3 in to an absorbing state and let ti be the expected number of steps
to absorption when starting in state i, i = 1, 2. We get
t1 = 1 + 0.2t1 + 0.5t2
t2 = 1 + 0.4t1 + 0.3t2
with the solution t1 = t2 = 10/3 = 3.333.
b) Consider the chain that is obtained when state 3 is made into an absorbing
state. The asked for probability is the probability that this chain is not in
state 3 after four time steps, since if it goes there before that time it would
remain there also at time 4. We have that
 4  
0.2 0.5 0.3 0.1076 0.1325 0.7599
P(4) = P4 =  0.4 0.3 0.3  =  0.1060 0.1341 0.7599 
0 0 1 0 0 1

The probability of reaching state 3 after 4 time steps when starting in state
(4)
1 is therefore p13 = 0.7599 and the probability of not being in state 3 is
1 − 0.7599 = 0.2401.
c) The chain is finite, irreducible (every state is accessible from all others)
and aperiodic (diagonal elements > 0). Therefore the chain is ergodic. The
asymptotic distribution is given by the stationary one π = πP, and is given
by the equation system
π1 = 0.2π1 + 0.4π2 + 0.3π3
π2 = 0.5π1 + 0.3π2 + 0.2π3
π3 = 0.3π1 + 0.3π2 + 0.5π3
π1 + π2 + π3 = 1
7. Solutions to the exercises 40

with the solution π1 = 0.3021, π2 = 0.3229 and π3 = 0.3750.

1.10 a) Let k be the state that k different items have been obtained, k = 2, 3, 4. If
2 different items have been obtained, the probability that you still only have
2 different items after buting a new product is the probability that the last
product comes with the two items you allready have. Since there are 42 = 6
ways to choose 2 itemsout of 4, this probability is 1/6. In the same way,
the probability to after the new product have all 4 items is 1/6, hence the
probability to have 3 different items is 1-1/6-1/6=2/3.
If you have 3 different items, the probability of still having 3 different items
after the new product is 1/2, since the two new items should be taken from
the three items already obtained, and the number of ways of selecting 2 out
of 3 items is 32 = 3. The probability to have all four items after last bought
product is also 3/6=1/2. The state 4 is of course absorbing. We have the
transition matrix  
1/6 2/3 1/6
P =  0 1/2 1/2 
0 0 1
X1 = 2, i.e. the initial distribution is (1, 0, 0).

b) The number of buys, after the first product, until a full collection is obtained
is the number of steps the chian makes before absorbing. With the usual
notation we get
1 2
t2 =1 + t2 + t3
6 3
1
t3 =1 + t3
2
which has the solution t2 = 2.8, t3 = 2. Together with the first package the
expected cost is 3.8 · 15 = 57 SEK.

1.11 a) Since the row sums in an intensity matrix are 0, the matrix must be
 
−8 4 4
 3 −5 2 
0 2 −2

b) The time, T2 , that the process remains in state 2 is Exp(q2 )=Exp(2). There-
fore, P (T2 ≥ 1) = e−2·1 = e−2 = 0.1353.

c) The first jump of the process is from state 2 to state 1, since the intensity
to jump to state 0 is zero. The probability that the next jump is to state 0 is
q10 /q1 = 3/5.

d) If p(t) = (p0 (t), p1 (t), p2 (t)) the equation system is given by p′ (t) = p(t)Q
or written out
7. Solutions to the exercises 41

p′0 (t) = −8p0 (t) + 3p1 (t)


p′1 (t) = 4p0 (t) − 5p1 (t) + 2p2 (t)
p′2 (t) = 4p0 (t) + 2p1 (t) − 2p2 (t)

It can be illustrative to derive this equation system using a Markov reasoning.


In a time interval of length h, the probability that the process moves from i
to j, is qij h+O(h). The probability of remaining in state i is 1 − qi h+O(h).
The probability of more than one jump is O(h). We get

p0 (t + h) = P (X(t + h) = 0)
= P (X(t) = 0)(1 − qi h) + P (X(t) = 1)q10 h + P (X(t) = 2)q20 h+O(h)
= p0 (t)(1 − qi h) + p1 (t)q10 h + p2 (t)q20 h+O(h)

which results in
p0 (t + h) − p0 (t) O(h)
= −p0 (t)qi + p1 (t)q10 + p2 (t)q20 +
h h

If we let h go to 0 we get

p′0 (t) = −p0 (t)qi + p1 (t)q10 + p2 (t)q20 = −8p0 (t) + 3p1 (t)

In the same way the other equations are obtained.

e) Make state 0 into an absorbing state. Let ti be the expected time until the
process ends up in state 0, given that it starts in state i, i = 1, 2. We get the
equation system
1
t1 = + 0.4t2
5
1
t2 = + t1
2
which has the solution t1 = 2/3 and t2 = 7/6. The requested time is 7/6.

f) The chain is finite and irreducible, all states are accessible from each of the
states in the chain. Therefore, the Markov process is ergodic and the asymp-
totic distribution is given by the stationary, which is given by the equation
system:
πQ = 0, π0 + π1 + π2 = 1 or explicitly

−8π0 + 3π1 = 0
4π0 − 5π1 + 2π2 = 0
4π0 + 2π1 − 2π2 = 0
π0 + π1 + π2 = 1

Solving this equation system you get π0 = 3/25, π1 = 8/25 and π2 = 14/25.
7. Solutions to the exercises 42

1.12 a)  
0 1 0
P̃ =  1/3 0 2/3 
0 1 0

b) The Markov process has a finite state space and is irreducible. It is therefore
ergodic and the asymptotic distribution is given by the stationary, PQ =
0, P1 + P2 + P3 = 1, i.e.,

−P1 + P2 = 0
P1 − 3P2 + 2P3 = 0
2P2 − 2P3 = 0
P1 + P2 + P3 = 1

with the solution P1 = P2 = P3 = 1/3.

1.13 If we let aij be equal to the probability to get absorbed in state j given a start
in state i, i = 1, 2 j = 3, 4, we get
2 2
a13 = + a23
8 8
2 3
a23 = + a13
6 6
Solving this equation system gives a13 = 8/21 and a23 = 11/21. The probabil-
ity of absorbing in state 4 when starting in state 1 is a14 = 1 − a13 = 13/21.

b) If we let ti be the expected time to absorption when starting in state i, we


get
1 1
t1 = + t2
8 4
1 1
t2 = + t1
6 2
Putting the second equation into the first, we get t1 = 4/21.

1.14 We determine first the stationary distribution. It is given by the equation


system PQ = 0,

−3π0 + 4π1 + π2 = 0
π0 − 10π1 + 4π2 = 0
2π0 + 6π1 − 5π2 = 0
π0 + π1 + π2 = 1

The equation system has the solution π0 = π2 = 0.4 and π1 = 0.2. The
1
expected time between two entries to state 0 is q01π0 = 3·0.4 . This means
7. Solutions to the exercises 43

that the expected time until the process for the third time visits state 0 is
3
3·0.4 = 2.5.

b) The expected time that the process is in state 1 between two entries in state
0 is q0ππ10 = 3·0.4
0.2
. The expected time that the process is in state 1 during the
time until the chain returns to state 0 for the third time is therefore 3 times
that value, i.e. 0.5.

1.15 Introduce the states

S0 = both components are working


S1 = one component works, one component is broken
S2 = both components are broken

Let X(t) be the state of the process at time t. {X(t); t ≥ 0} is a Markov


process with intensity matrix
 
−2λ 2λ 0
Q= µ −λ − µ λ 
0 2µ −2µ

with λ = 1/400 and µ = 1/20. If, for example, the process is in state S0 ,
to events can take place; component 1 can break and component two can
break. Both occur with intensity λ and the process jumps to the state S1 .
The intensity for this jump is then λ + λ = 2λ. The asymptotic availability
is the probability that the process is in state S0 . Since the process is finite
and irreducible, the asymptotic probability is the same as the stationary and
is determined by the equation system PQ = 0,

−2λπ0 + µπ1 = 0
2λπ0 − (λ + µ)π1 + 2µπ2 = 0
λπ1 − 2µπ2 = 0
π0 + π1 + π2 = 1

which has the solution π0 = µ2 /(λ + µ)2 , π1 = 2λµ/(λ + µ)2 and π0 = λ2 /(λ +
µ)2 . With the numbers inserted π0 = 400/441, π1 = 40/441 and π2 = 1/441.
The asymptotic availability is then 400/441 ≈ 90.7%.

1.16 a) Let X(t) be the state of the machine at time t and let ti be the expected
time until the process reaches state 3 given that it starts in state i, i = 1, 2.
We get
1 7
t1 = + t2
8 8
1 32
t2 = + t1
36 36
which has the solution t1 = 43/64 and t2 = 5/8. A machines life expectancy
is hence 43/64 years.
7. Solutions to the exercises 44

b) Let now ui , i = 1, 2, be the expected time spent in state i during the


life expectancy of the machine. It can be determined from the stationary
Pi
distribution as 100P 3
(expected time in state i between two entries in state
3, and change of machine), but also from the following reasoning. From the
start the machine is in state 1 in the expected time 1/8. Then, either it goes
to state 3 without coming back to state 1, or otherwise it goes to state 2 and
then back to state 1 and then the expected time u1 before the machine ends
up in state 3. We get
1 7 8
u1 = + · u1
8 8 9
giving u1 = 9/16. Hereby it follows that u2 = t1 − u1 = 7/64. The expected
9 7
income is therefore 100000 16 + 40000 64 = 60625 SEK.

1.17 Let pi (t) = P (X(t) = i). We obtain the following differential equation system
p′1 (t) = −4p1 (t) + 3p2 (t) + 3p3 (t)
p′2 (t) = p1 (t) − 7p2 (t)
p′3 (t) = 3p1 (t) + 4p2 (t) − 3p3 (t)
Form the Laplace transform of the left and right hand sides, we get
sp∗1 (s) = −4p∗1 (s) + 3p∗2 (s) + 3p∗3 (s)
sp∗2 (s) = p∗1 (s) − 7p∗2 (s)
sp∗3 (s) − 1 = 3p∗1 (s) + 4p∗2 (s) − 3p∗3 (s)
since the Laplace transform of p′i (t) is sp∗i (s) − pi (0). The equation system has
the solution
3 3/7 3/7
p∗1 (s) = = −
s(s + 7) s s+7
3 3/49 3/7 3/49
p∗2 (s) = 2
= − 2

s(s + 7) s (s + 7) s+7
2
s + 11s + 25 25/49 3/7 23/49
p∗3 (s) = = + +
s(s + 7)2 s (s + 7)2 s+7
Inverting the Laplace transform gives
3 3 −7t
p1 (t) = − e
7 7
3 3 3 −7t
p2 (t) = − te−7t − e
49 7 49
25 3 −7t 24 −7t
p3 (t) = + te + e
49 7 49

1.18 Since the tunnel is 1 km long and the cars drives 60 km/hour, the cars that
are inside the tunnel at time t must have arrived to the tunnel during the time
interval [t − 1, t], since it takes one minute to drive through the tunnel.
The number of cars,X, arriving during a time interval of length 1 is Po(2 · 1).
We get P (X ≤ 3) = 0.8571, using a table or the probability function.
7. Solutions to the exercises 45

1.19 The number of clients arriving during a time interval of length t is Po(λt).
Therefore,
X j ∞
X
−λt (λt) −λt (λt)2k
P (X even) = e =e
j! (2k)!
j even k=0
P∞ (λt)2k 1 P∞ (λt)k (−λt)k
But k=0 (2k)! = 2 k=0 ( k! + k! ) = 12 (eλt + e−λt ) since the terms in
the sum with odd k cancel each other out. Inserting this yields P (X even) =
1 −2λt ). The other equality follows from P (X odd) = 1 − P (X even).
2 (1 + e

1.20 Clearly, {X(t); t ≥ 0} is a birth-death process with a finite state space. There-
fore, it is erodic. The birth- and deathintensities are given by

λi = (N − i)λ, i = 0, 1, 2 . . . , N − 1
µi = iλ, i = 1, 2, . . . , N

We see this because in state i each of the particles are transfered after an ex-
ponentially distributed time, after which the state either increases or decreases
with 1. If we let
 
λ0 λ1 · · · λi−1 N (N − 1) · · · (N − i + 1) N
ρi = = = , ρ0 = 1
µ1 µ2 · · · µi i! i

the asymptotic distribution is given by Pi = PNρi . But according to the bi-


j=0 ρj
PN PN N  N N N 1 N
nomial theorem: j=0 ρj = j=0 j = (1 + 1) = 2 . Thus Pi = i ( 2 ) ,
i.e., the asymptotic distribution is Bin(N, 12 ). If urn A does not contain any
particle, the expected time until it is empty is the same as the expected time
that the process lies in the states
P
1, 2, . . . , N between two entries in the empty
N
i=1 Pi 1−P0 2N −1
state. This expected time is q 0 P0 = N λP0 = Nλ .

1.21 The described system is a Jackson network with parameters

m=2 c1 = c2 = 1

λ1 = λ λ2 = 0
p12 = 1 p2 = p p21 = 1 − p.
We have
Λ1 = λ + (1 − p)Λ2 and Λ2 = Λ1
which gives
λ
Λ1 = Λ2 =
p
From the theory it follows that

pk,n = pn = ρk1 (1 − ρ1 ) · ρn2 (1 − ρ2 ), for k, n = 0, 1, . . . ,

where ρ1 = λ/(pµ̃1 ) and ρ2 = λ/(pµ̃2 ).


7. Solutions to the exercises 46

0.4 λ

4 λ λ 0.1 λ
24
0.1 λ
0.4 λ
4
0.4 λ

13
0.4 λ 0.4 λ

2.1 The following figure illustrates the system

Equilibrium at the carousel =⇒

4 + 0.4λ + 0.4λ = λ

This gives that


λ = 20, 0.4λ = 8, and 0.1λ = 2.

Let the indices c, g and p denote the carousel, ghost train and portrait drawing
respectively and define
Vc = expected waiting time at the carousel
Wc = expected remaining time in the system if one has just
arrived to the carousel
Vg , Vp and Wg , Wp are defined analogously.
We then get

1 1 1 1 1
Vc = = = , Vg = , Vp = .
µ c − λc 24 − 20 4 5 2
For Wc and Wg the following system of equations is obtained
 1 1
Wc = 4 + 0.4Wc + 0.4Wg + 0.1 · 2
1
Wg = 5 + Wc

with the solution

Wc = 1.9 Wg = 2.1.

The average time a child spends at the carnival is therefore 1.9 hours (if all
children first go to the carousel).
7. Solutions to the exercises 47

2.2 The following figure illustrates the system

R λR
20 λR λR 2
36

λR λB
2 3
18
λB λB λK
B K

(a) Equilibrium at the kiosk and the bar =⇒

2
λB = 0.5λR , λK = λB
3
Equilibrium at the roulette =⇒

1 1
20 +
· λR = λR
3 2
The equilibrium equations give that λR = 24, λB = 12 and λR = 8. Thus
it holds that

λR 2 λB 2
ρR = = , ρB = = .
µR 3 µB 3
The equilibrium distribution for the number of people at the roulette is
therefore the same as the equilibrium distribution of people at the bar,
i.e.

pR B n n
n = pn = (1 − ρR )ρR = 2 · 3
−(n+1)
.
(b) Let
VR = the average waiting time at the roulette
WR = expected remaining time in the system if one
has just arrived to the roulette
VB and WB are defined analogously.
It then follows that

1 1 1 1
VR = = , VB = = .
µ R − λR 12 µ B − λB 6
We can now state the following system of equations for WR and WB :
(
1 1
WR = 12 + 2 WB
1 1 =⇒ WR = 15 , WB = 7
30
WB = 6 + 3 WR

The average time a randomly selected customer spends in the gambling


establishment is thus 1/5 hour = 12 minutes. (Since everyone starts out
at the roulette.)
7. Solutions to the exercises 48

(c) The revenue per hour is = 10 · λK = 80 kr.

2.3 The following figure illustrates the system

λ
6 λ λ 3
12
I 2λ
3
9
18
II

(a) Equilibrium at I =⇒

1
6+ λ=λ =⇒ λ=9
3
Let
VI = average waiting time the first time one is being served at I
WI = average waiting time at I (including loops)
We get

1 1 1
VI = = =
µ−λ 12 − 9 3
and

1 1
W I = VI + W I =⇒ WI =
3 2
The arrival intensity to II, λII = 9 + 23 λ = 9 + 6 = 15
Jackson =⇒ Equilibrium distribution at II is given by

λII 5
pII (n) = ρn (1 − ρ), ρ= =
µII 6
(b) Let
WIIA = waiting time on average at station II for A-customers
WIIB = waiting time on average at station II for B-customers
A,B
WII =
waiting time on average for a randomly selected customer
at station II
Ordinary M/M/1-theory gives that

A,B 1 1 1
WII = = =
µII − λII 18 − 15 3
B 1 1 1
WII = = =
µII − λB 18 − 9 9
7. Solutions to the exercises 49

The probability that a random customer is a i-customer is

λi
, i = A, B ,
λA + λB
and we can thus state the following equation:

A,B 9 B 6 A A 2
WII = WII + WII =⇒ WII = .
15 15 3
The average time a randomly selected A-customer spends in the system
is thus
A A 1 2 7
Wtot = WI + WII = + = hours = 70 minutes
2 3 6
2.4 (a) The following figure illustrates the system

I 2 λ1
10 λ1 λ1 3
22

λ1 λ2
3 2
12
22
λ2 λ2 λ2
II 2

Equilibrium at station I and II =⇒


96
10 + 12 λ2 = λ1 λ1 = 5 = 19.2
=⇒
12 + 31 λ1 = λ2 λ2 = 92
5 = 18.4
We can now compute

1 1
V1 = = = 0.36, V2 = 0.28
µ1 − λ1 22 − 19.2
The expected remaining time in the system after having arrived to station
I, W1 , is now given by

W1 = 0.36 + 31 W2
=⇒ W1 = 0.54
W2 = 0.28 + 12 W1
(b) At station III the situation is the following:

λI

24
λII

where λI = 23 λ1 = 12.8 and λII = 12 λ2 = 9.2


7. Solutions to the exercises 50

Let
WI =
waiting time on average for customers from station I
WII =
waiting time on average for customers from station II
WI,II =
waiting time on average for a randomly selected customer,
i.e. either from station I or station II.
Ordinary M/M/1-theory gives that

1 1 1
WI,II = = =
µIII − (λI + λII ) 24 − (12.8 + 9.2) 2
1 1
WI = = = 0.089
µIII − λI 24 − 12.8
The probability that a randomly selected customer comes from station i
is

λi
, i = I, II ,
λI + λII
and we can therefore state the following equation:

12.8 9.2
0.5 = · 0.089 + · WII .
22 22
The average waiting time for a customer that has been thrown out of
station II is thus WII ≈ 1.07.

2.5 (a) Since the machines operate and are repaired independently one can con-
sider one machine at the time. We get the following rate diagram (0 =
functioning machine, 1 = broken machine)

0 1

The equilibrium equation becomes

λp0 = µp1 .
1 1
Since λ = 120 , µ = 12 gives that µλ = 10
1
the equilibrium equation together
with the requirement p0 + p1 = 1 yields

10 1
p0 = , p1 = .
11 11
Let N denote the number of broken machines. Then N is a random
variable and N ∈ Bin(4, p1 ). Thus it holds that
     
4 4 4
P (N ≥ 2) = p0 2 p1 2 + p0 p1 3 + p1 4 ≈
2 3 4
≈ 0.044 < 5%
7. Solutions to the exercises 51

Since the mean service time for a machine is 12 hours we see that both
requirements are fulfilled if there is one computer technician per depart-
ment.
(b) In this case all four machines must be considered at the same time. In
the rate diagram below the number in each node indicates the number of
broken machines.

4λ 3λ 2λ λ

0 1 2 3 4

µ 2µ 2µ 2µ

The equilibrium equations give that

4λ 3λ λ λ
p1 = p0 , p2 = p1 , p3 = p2 , p4 = p3 ,
µ 2µ µ 2µ
which implies that

p1 = 4ρp0 , p2 = 6ρ2 p0 , p3 = 6ρ3 p0 , p4 = 3ρ4 p0 .


X
where ρ = λµ . Since pi = 1 it follows that
i

p0 ≈ 0.68, p1 ≈ 0.27 p2 ≈ 0.04 p3 ≈ 0.004 p4 ≈ 0.0002.


We may now compute

X
L = npn ≈ 0.3628
n
X
λ̄ = λn pn = 4λp0 + 3λp1 + 2λp2 + λp3 ≈ 0.0301
n

Little’s formula now gives that

L
W = ≈ 12.05 < 13.
λ̄
Since P (N ≥ 2) = p2 + p3 + p4 ≈ 0.045 < 5% it follows that both
requirements are fulfilled also with the new system.
(c) We seek E[wq |wq > 0].

E[wq ] = E[wq |wq = 0] · P (wq = 0) + E[wq |wq > 0] · P (wq > 0)


But, E[wq ] = Wq = W − µ1 = 0.05 and P (wq > 0) = P (N ≥ 3) =
p3 + p4 ≈ 0.0042. Thus it follows that

0.05
E[wq |wq > 0] = ≈ 11.9 > 4.
0.0042
i.e. the requirement is not fulfilled.
7. Solutions to the exercises 52

Remark: One sees immediately that

E[wq |wq ≥ 0] > 6 hours


,
1
where 6 = E[min(x1 , x2 )], xi ∈ Exp( 12 ).

2.6 Appropriate model: M/M/K/K-que. There are two cases: To accept or not
accept the offer.

I Assume that S accepts the offer. Then they get a contribution margin
of 150 kr from the “fourth” room. The other three rooms constitute a
M/M/3/3-system with the following rate diagram (the number in each
node indicates the number of let rooms):

λ λ λ

0 1 2 3

µ 2µ 3µ

The equilibrium equations become:


λp0 = µp1
λp0 + 2µp2 = λp1 + µp1
λp1 + 3µp3 = λp2 + 2µp2
λp2 = 3µp3
3
X
Together with the requirement that pi = 1 and the fact that λ = 3µ
i=0
this implies that

2 6 9
p0 = , p1 = , p2 = p3 =
26 26 26
51
On average the number of let rooms is = p1 + 2p2 + 3p3 = ≈ 1.96.
26
The average contribution margin for the four rooms is:

51 19200
150 + 300 · = ≈ 738 kr/day.
26 26
II Assume that S rejects the offer. In this case the four rooms constitute a
M/M/4/4-system with the following rate diagram:

λ λ λ λ

0 1 2 3 4

µ 2µ 3µ 4µ

As above we get that


7. Solutions to the exercises 53

8 24 36 27
p0 = , p1 = , p2 = p3 = , p4 = .
131 131 131 131
312
The number of let rooms is (on average) = p1 + 2p2 + 3p3 + 4p4 = ≈
131
2.38. The average contribution margin for the four rooms is:

312 93600
300 · = ≈ 715 kr/day.
131 131
Mr and Mrs S will thus get a higher expected revenue if they accept the offer.

2.7 (a) The following figure illustrates the system

0.2λ 0.2λ

105 40
24 λ λ 0.8λ
0.8λ

44
8
8+0.8λ

Equilibrium at the forms =⇒

24 + 0.2λ = λ =⇒ λ = 30.

With Wr =total waiting time in the requisition-loop one gets

1 1
Wr = + 0.2Wr =⇒ Wr = hours = 1 minute.
105 − 30 60
WS =the waiting time in the supply shed is given by

1 1
WS = = hour = 3.75 minutes.
40 − 24 16
With the indices s=soldier and l=lieutenant one gets, at the kiosk

1 1 1 1
Ws = 44−21 = 20 , Ws+l = 44−(24+8) = 12
Ws+l = Ws · ps + Wl · pl =⇒
1 1
12 = 20 · 24 8
32 + Wl · 32 =⇒ Wl = 11
60

The total waiting time for a soldier is = 1 + 3.75 + 3 = 7.75 minutes.


The total waiting time for a lieutenant is = 11 minutes.
(b) The diagram becomes:
7. Solutions to the exercises 54

4 12

1.1 1.2
12
4 12 4

2.1 2.2
12

The equilibrium equations are:


4p0 = 12p12
16p11 = 4p0 + 12p22
16p12 = 12p11
4p11 = 12p21
12p22 = 12p21 + 4p12

From these one gets that

4 1 4 7
p11 = p0 , p12 =
p0 , p21 = p0 , p22 = p0 .
9 3 27 27
P 27
This together with the fact that pi = 1 gives that p0 = . Thus:
59
27 12 9
p0 = 59 , p11 = 59 , p12 = 59 ,
4 7
p21 = 59 , p22 = 59 .

2.8 (a) It is given that λ = 10, λA = 5, λB = 3, λC = 2 and µ = 20. Thus we get

1 1 1 1
WA = = , WA+B = =
20 − 5 15 20 − 8 12
5 3 1
WA+B = WA + WB =⇒ WB =
8 8 9
1 1
WA+B+C = =
20 − 10 10
8 2 1
WA+B+C = WA+B + WC =⇒ WC =
10 10 6

(b) Rate diagram for station II:

8 8 8 8

0 1 2 3 4

4 8 8 8

Equilibrium =⇒

p1 = 2p0 , p4 = p3 = p2 = p1 .
7. Solutions to the exercises 55

P4
That i=0 pi = 1 implies that p0 = 19 . The equilibrium distribution is
thus

1 2 2 2 2
p0 = , p1 = , p2 = , p3 = , p4 = .
9 9 9 9 9
(c) We get

X 20
L = n · pn =
,
9
X 8 16 16 16 56
λ̄ = pn λn = + + + = (λ4 = 0!).
9 9 9 9 9
Little’s formula, W · λ̄ = L, gives

20 5
W = 56 = 14 .

2.9 The queueing system is illustrated below.


λ/2

λA ❄λ
✲ ❧✲ I ✲ II ✲ ❧

λ/2


λB ✲ ❧✲ ✲
III

Equilibrium at I: λA + λ/2 = λ ⇒ λ = 20.


Let VI = average waiting time the first time you are being served at station
I. We get VI = 1/(25 − 20) = 1/5. In a corresponding way we get that
VII = 1/4. Let WI,II be the average waiting time including loops at I,II. We
get that WI,II = VI + VII + WI,II /2 ⇒ WI,II = 0.9.
B = 1/12. The arrival intensity at III is 10+10.
At III we get that WIII

A,B
WIII = 1/(22 − 20) = 1/2.

A,B A + 1/2 · W B ⇒ W A = 11/12.


WIII = 1/2 · WIII III III

The total time for A-customers is thus 9/10 + 11/12 hours = 109 minutes.

2.10 We get the following rate diagram:

λ = 45, λ0 = 40, µ = 50.


2λ0
p1 = µ p0 , pn+1 = µλ pn , n ≥ 1
7. Solutions to the exercises 56

λ0 λ λ λ λ

0 1 2 3 4

µ/2 µ µ µ µ

 n
2λ0 λ
p1+n = µ µ p0 , n ≥ 0, ρ = λ/µ
P h P ni
pn = 1 ⇒ p0 1 + 2λµ0 ρ =1
h i−1
2λ0
⇒ p0 = 1 + µ−λ

(a) p0 = 1/17 ≈ 0.06, p1+n = 0.06 · 1.6 · (0.9)n ≈ 0.1 · (0.9)n .


P P∞ 2λ0 P∞
(b) L = ∞ n=1 npn = n=1 (1 + n)pn= p0 µ
n
n=1 (n + 1)ρ =
d P∞ ρ
= p0 2λµ0 dp n=1 ρ
n+1 = p 2λ0 d 2λ0
0 µ dp 1−ρ = p0 µ · (1−ρ)2
1

L = 9.41.
P P∞
(c) Little: W = L/λ̄ λ̄ = ∞ n=0 λn pn = λ0 p0 + λ n=1 pn =
= λ0 p0 + λ(1 − p0 ) ⇒ λ̄ = 44.7
W ≈ 0.21 hours ≈ 12 minutes 28 seconds.
P
(d) P(queue) = ∞ n=3 pn = 1 − p0 − p1 − p2 = 1 − 0.24 = 0.76.

2.11 (a) We get the following equilibrium equations:


λP0 = µP1 ,
(λ + nµ)Pn = λPn−1 + (n + 1)µPn+1 , n = 1, . . . , K − 1,
λPK−1 = KµPK .

With ρ = λ/µ we get

ρn
Pn = P0 , n = 0, . . . , K.
n!
Summing up gives
K
X K
X ρn
1= Pn = P0 ,
n=0 n=0
n!
which implies that
ρn
Pn = K
, n = 0, . . . , K.
X ρn
n!
n!
n=0

(b) When K → ∞ it holds that

ρn −ρ
Pn → e , n = 0, . . . ,
n!
which is the distribution for a Poisson distributed random variable.
7. Solutions to the exercises 57

λ/2

0.0 0.1 0.2 0.3 ...

µ µ µ µ
µ λ/2 µ λ µ λ µ λ
λ λ/2

1.0 1.1 1.2 1.3 ...


µ µ µ µ

µ µ λ/2 µ λ µ λ
λ λ λ/2

2.0 2.1 2.2 2.3 ...

µ µ µ µ
µ µ µ λ/2 µ λ

... ... ... ...

2.12 (a) We get the following rate diagram.

(b) The rate diagram above is truncated in the sense that only the states 0.0,
0.1, 1.0 and 1.1 remain.
With λ = µ = 1 we get the following equilibrium equations:
P0.0 = P1.0 + P0.1 ,
1
2P0.1 = P0.0 + P1.1 ,
2
1
2P1.0 = P0.0 + P1.1 ,
2
2P1.1 = P0.1 + P1.0 .
Since the sum of the probabilities is one it follows that
2 1 1 1
P0.0 = , P0.1 = , P1.0 = , P1.1 = .
5 5 5 5
That P1.1 = 1/5 means that 1/5 of the customers that arrive at the system
are rejected. Thus 4/5 of the arriving customers pass the system, which
implies that the intensity out of the system is 4/5. Due to symmetry the
intensity is equally large at both stations, i.e. the intensity out of each
station is 2/5.
2.13 (a) We have
a1 = 2, µ1 = 15, p11 = 0, p12 = 0.9,
a2 = 3, µ2 = 120, p21 = 0.2, p22 = 0.7,
where i = 1 means the swing (G), i = 2 means the slider (R) and the
time is measured in hours.
The network is shown in the following figure:

(b) With the given values it holds


λ1 = 2 + 0.2λ2 ,
λ2 = 3 + 0.9λ1 + 0.7λ2 ,
7. Solutions to the exercises 58

G
2 λ1 λ1 0.1 λ1
15

0.9 λ1 0.2 λ2
3
120
λ2 λ2 0.1 λ2
R

0.7 λ2

that is λ1 = 10, λ2 = 40. The conditions for “light traffic”, λ1 < µ1 and
λ2 < µ2 , are thus fulfilled. We have two independent M/M/1-systems
and if we let Wq,i denote the expected queueing times we get
λi
Wq,i = , i = 1, 2.
µi (µi − λi )
With the given values it follows that
2 1
Wq,1 = and Wq,2 = ,
15 240
that is, the expected queueing time to the swing is 8 minutes and the
expected waiting time to the slider is 15 seconds.
(c) As above we can compute the time at the swing and the slider to be
Wi = 1/(µi − λi ), i.e.
1 1
W1 = and W2 = .
5 80
If we let WG denote the expected time until a child leaves the play ground
assuming that the child arrives at the swing and WR denote the expected
time until a child leaves the play ground assuming that the child arrives
at the slider we get
WG = W1 + 0.9WR ,
WR = W2 + 0.2WG + 0.7WR .
With W1 and W2 from above it follows that
19 7
WG = and WR = .
32 16
Thus we can compute W from
a1 a2 1
W = WG + WR = .
a1 + a2 a1 + a2 2
The expected waiting time for a parent is thus 30 minutes.
(Alternatively we can, as in the book, compute L = L1 + L2 where
L1 and L2 are the expected number of children in the swing and slider
respectively. We then get
1
L1 = 2 and L2 = .
2
7. Solutions to the exercises 59

Next, W can be computed by using Little’s formula as


L
W = .
a1 + a2
This gives the same answer.)
(d) From (c) we get
19
. WG =
32
The expected waiting time for a parent whose child first goes to the slider
is thus 35.625 minutes. (This is not immediate from Little’s formula.)

2.14 I. We get the following rate diagram, where λ = 2 and µ = 3.

4λ 3λ 2λ λ

0 1 2 3 4

µ 2µ 2µ 2µ

This gives
4λ 3λ λ λ
P1 = P0 , P2 = P1 , P3 = P2 , P4 = P3 .
µ 2µ µ 2µ
Since the sum of these probabilities is one we get
27 72 72 48 16
P0 = , P1 = , P2 = , P3 = , P4 = .
235 235 235 235 235
This gives
4
X 424
E(N ) = nPn = .
235
n=0

The expected number of operational machines is now given by 4 − E(N ),


i.e. 516/235 ≈ 2.2.
II. We now have two separate systems with the following rate diagram, where
λ = 2 and µ = 10/3.

2λ λ

0 1 2

µ µ

This gives
2λ λ
P1 = P0 , P2 = P1 .
µ µ
Since the sum of these probabilities is one we get
25 30 18
P0 = , P1 = , P2 = ,
73 73 73
7. Solutions to the exercises 60

which gives
2
X 66
E(N ) = nPn = .
73
n=0

The expected number of operational machines is in this case given by


2(2 − E(N )), i.e. 160/73 ≈ 2.2.
III. We now get following rate diagram, where λ = 2 and µ = 5.

4λ 3λ 2λ λ

0 1 2 3 4

µ µ µ µ

This gives
4λ 3λ 2λ λ
P1 = P0 , P2 = P1 , P3 = P2 , P4 = P3 .
µ µ µ µ
Since the sum of these probabilities is one we get
625 1000 1200 960 384
P0 = , P1 = , P2 = , P3 = , P4 = ,
4169 4169 4169 4169 4169
which means that
4
X 1123
E(N ) = nPn = .
599
n=0

The expected number of operational machines is now given by 4 − E(N ),


i.e. 1273/599 ≈ 2.1.

3.1 The situation is repeated every month. It is given that the ingoing inventory
level was N . If we order up to the level x (x ≥ N ) the expected profit in kkr
will be:

Z x
P (x) = −300(x − N ) + [400z − 50(x − z)]f (z)dz +
0
Z ∞
+ [400x − 100(z − x)]f (z)dz
x

We get

Z x
P ′ (x) = −300 + [400x − 50 · 0]f (x) − 50 f (z)dz +
0
Z ∞
−[400x − 100 · 0]f (x) + 500 f (z)dz =
x
= −300 − 50F (x) + 500[1 − F (x)],
7. Solutions to the exercises 61

where F is the distribution function. Since F is increasing, P will be decreas-


ing, i.e. P is a concave function. The maximum is achieved when P ′ (x) = 0,
i.e. F (x̂) = 200/550 < 0.5 . Since F (15)=0.5 it is obvious that x̂ < 15. For
x < 15 we get
Z x
1 (x − 10)2
F (x) = (z − 10)dz = .
25 10 50

This gives that (x̂ − 10)2 = 50 · 200/550, and thus x̂ = 14.26. Rounding gives
x̂ = 14. He should thus order 14 − N cars, if N ≤ 14. Otherwise he should,
due to the concavity, not do anything. In steady-state N ≤ 4, when x̂ = 14
and the demand is at least 10 cars.
7. Solutions to the exercises 62

3.2 (a) Deterministic varying demand and zero ingoing inventory.


Wagner-Whitin
We have K = 1000 kr, h = 1 kr/hg and night, c = 2 kr/hg.
r1 = 500 hg, r2 = 300 hg, r3 = 200 hg, r4 = 500 hg, r5 = 700 hg.

c5 = 1000 + 2 · 700 = 2400

c44 = 1000 + 2 · 500 + 2400 = 4400


c54 = 1000 + 2 · 1200 + 1 · 700 = 4100

c4 = 4100 ̂4 = 5

c33 = 1000 + 2 · 200 + 4100 = 5500


c43 = 1000 + 2 · 700 + 1 · 500 + 2400 = 5300
c53 = 1000 + 2 · 1400 + 1 · 500 + 2 · 700 = 5700

c3 = 5300 ̂3 = 4

c22 = 1000 + 2 · 300 + 5300 = 6900


c32 = 1000 + 2 · 500 + 1 · 200 + 4100 = 6300
c42 = 1000 + 2 · 1000 + 1 · 200 + 2 · 500 + 2400 = 6600
c52 = 1000 + 2 · 1700 + 1 · 200 + 2 · 500 + 3 · 700 = 7700

c2 = 6300 ̂2 = 3

c11 = 1000 + 2 · 500 + 6300 = 8300


c21 = 1000 + 2 · 800 + 1 · 300 + 5300 = 8200
c31 = 1000 + 2 · 1000 + 1 · 300 + 2 · 200 + 4100 = 7800
c41 = 1000 + 2 · 1500 + 1 · 300 + 2 · 200 + 3 · 500 + 2400 =
= 8600
c51 = 1000 + 2 · 2200 + 1 · 300 + 2 · 200 + 3 · 500 + 4 · 700 =
= 10400

c1 = 7800 ̂1 = 3
Optimal policy is to order 1000 hg on Monday and 1200 hg on Thursday.
The cost will then be 7800 kr.
7. Solutions to the exercises 63

Alternative solution: The bacon will be purchased at some time, sooner


or later, so the cost for buying it may be neglected in the optimization.

c̃5 = 1000 = 1000

c̃44 = 1000 + 1000 = 2000


c̃54 = 1000 + 1 · 700 = 1700

c̃4 = 1700 ̂4 = 5

c̃33 = 1000 + 1700 = 2700


c̃43 = 1000 + 1 · 500 + 1000 = 2500
c̃53 = 1000 + 1 · 500 + 2 · 700 = 2900

c̃3 = 2500 ̂3 = 4

c̃22 = 1000 + 2500 = 3500


c̃32 = 1000 + 1 · 200 + 1700 = 2900
c̃42 = 1000 + 1 · 200 + 2 · 500 + 1000 = 3200
c̃52 = 1000 + 1 · 200 + 2 · 500 + 3 · 700 = 4300

c̃2 = 2900 ̂2 = 3

c̃11 = 1000 + 2900 = 3900


c̃21 = 1000 + 1 · 300 + 2500 = 3800
c̃31 = 1000 + 1 · 300 + 2 · 200 + 1700 = 3400
c̃41 = 1000 + 1 · 300 + 2 · 200 + 3 · 500 + 1000 = 4200
c̃51 = 1000 + 1 · 300 + 2 · 200 + 3 · 500 + 4 · 700 = 6000

c̃1 = 3400 ̂1 = 3

The optimal policy is the same, to order 1000 hg on Monday and 1200
hg on Thursday. The cost will then be 3400 kr + the cost of the bacon
2 · (500 + 300 + 200 + 500 + 700) = 4400 kr for a total of 7800 kr.
(b) c51 − c1 = 10400 − 7800 = 2600, he earns 2600 kr.
7. Solutions to the exercises 64

3.3 (a) The purchase cost is= cy for y ≥ 0


Expected shortage cost, B(y), is given by

 0,Z when y ≥ a
B(y) = b a 1 b
 (z − y)2 dz = · (a − y)3 when 0 ≤ y < a
a y 3 a
The expected total cost for a week is given by (when y ≥ 0)
(
cy, when y ≥ a
T C(y) = 1 b
cy + · (a − y)3 , when 0 ≤ y < a
3 a
(b) Taking the derivative gives

(
c, when y ≥ a
T C ′ (y) = b
c − (a − y)2 when 0 ≤ y < a
a
(
0, when y ≥ a
T C ′′ (y) = 2b
(a − y) when 0 ≤ y < a
a
Since T C ′′ (y) ≥ 0 for y ≥ 0 the function T C is convex for y ≥ 0.
(c) Since T C ′ (a) = c > 0, T C ′ (0) = c − ab < 0 according to the assumption
and T C is convex, the function TC achieves its minimum in the open
interval (0, a).
(d) For the Family Meathead it holds that a = 100 hg, b = 0,5 kr/hg and c
= 8 kr/hg.
The y that minimizes T C is given by
r
b ac

T C (y) = c − (a − y)2 = 0 =⇒ y =a−
a b
Since ab − c = 100 · 0.5 − 8 = 42 > 0 the optimal ŷ is given by
r
100 · 8
= 60.
ŷ = 100 −
0.5
The Family Meathead should thus buy 60 hg each week.
3.4 (a) The following holds
Z ≤ 100 + x: Income 100Z
cost 40x
net 100Z − 40x
Z > 100 + x: Income 100Z
cost 40x + 80[Z − 100 − x]
net 100Z − 40x − 80[Z − 100 − x]
The expected net profit is thus given by the expression
Z ∞
E[net profit] = 100E[Z] − 40x − 80 [z − 100 − x]f (z)dz
100+x
7. Solutions to the exercises 65

(b) Frasse may just as well minimize


Z ∞
c(x) = 40x + 80 [z − 100 − x]f (z)dz
100+x

Taking the derivative =⇒


Z ∞

c (x) = 40 + 80 · 0 − 80 f (z)dz =
100+x
= 40 − 80[1 − F (100 + x)]
We see that c′ is increasing, i.e. c is a convex function.
There are two cases
i. x̂ = 0
ii. c′ (x̂) = 0
We get c′ (0) = 80 · F (100) − 40, and thus:
i. If F (100) > 1/2 then x̂ = 0.
ii. If F (100) < 1/2 then x̂ is given by
1
F (100 + x̂) = .
2
(c) If Z is exponentially distributed with expected value 1000 litres it has
the density function

1 3
f (z) = · e−z/10 when z ≥ 0.
103
c′ (x) = 0 =⇒

100+x 1
e− 103 =
2
It is thus optimal to purchase (in litres) x̂ = 103 · ln 2 − 100 ≈ 593.

3.5 (a) The figure below illustrates the problem

Z<y Z>y

y y
Zt
L(t) = y -
T
Zt
L(t) = y -
T
y-Z
T
T t t
yT
Z

L(t) = inventory level at time t.


7. Solutions to the exercises 66

The cost for the period is


Z T  
tZ
C(y, Z) = c(y − x) + I[Z ≤ y] h y− dt+
0 T
Z yT  
Z tZ
+ I[Z > y] h y− dt+
0 T
Z T 
tZ
+ I[Z > y] p − y dt =
yT T
Z 
Z y2
= c(y − x) + I[Z ≤ y] hT y − + I[Z > y] hT +
2 2Z
(Z − y)2
+I[Z > y] pT .
2Z
Take the expected value =⇒

Z y Z ∞
z y2
C(y) = c(y − x) + hT (y − )f (z)dz + hT f (z)dz +
0 2 y 2z
Z ∞
(z − y)2
+pT f (z)dz
y 2z

Taking the derivative =⇒

Z y Z ∞  Z ∞
′ y (z − y)
C (y) = c + hT f (z)dz + f (z)dz − pT f (z)dz
0 y z y z
Z ∞
Use that f (z)dz = 1 =⇒
0
Z y Z ∞ 
′ y
C (y) = c − pT + T (p + h) f (z)dz + f (z)dz
0 y z
Simple calculations give that C ′′ > 0, i.e. C is a convex function. Its
minimum y0 is thus given by the equation
Z y Z ∞
1 p − c/T
f (z)dz + y f (z)dz =
0 y z p+h
The optimal ŷ is given by

y0 if y0 ≥ x
ŷ =
x if y0 < x
(b) In this case T = 1, c = 0.14, h = 0.5, p = 0.5, x = 3 and

kx, 0 ≤ x ≤ 10
f (x) =
0, otherwise
Z 10
Use that f (z)dz = 1 in order to determine the constant k. The
0
constant is k = 0.02.
7. Solutions to the exercises 67

C ′ (y) = 0 =⇒ y 2 − 20y + 36 = 0

y = 10 ± 100 − 36 = 10 ± 8
y = 2 (y0 = 18 is not possible since 0 ≤ y0 ≤ 10)

With x = 3 it holds that y0 < x, thus it is optimal to chose ŷ = 3.

3.6 (a) See the book.


(b) See the book.
(c) Use the same technique as in (b). Assume that x kkr is transfered from
the check account to the savings account (0 ≤ x ≤ 10). Then the balance
on the check account for a month looks as follows

S, balance (kkr)

10 − x

day 0 x

x x
1− month month
10 10

Average asset:

(10 − x) x x
E[S+ ] = · (1 − ) = 5(1 − )2 .
2 10 10
Average shortage:
x x x
· = 5( )2 .
E[S− ] =
2 10 10
Docent Optimus yearly interest is

x 2 x
I(x) = 0.1x + 0.02 · 5(1 − ) − 0.17 · 5( )2 =
10 10
= {0.1x := y} = y + 0.1 · (1 − y)2 − 0.85y 2 .

I is a concave function in y and thus also in x (recall that 0.1x = y).


Maximizing with respect to y gives

d 0.8
I(y) = 1 − 0.2(1 − y) − 1.7y = 0.8 − 1.5y = 0 =⇒ y= .
dy 1.5
8
The optimal x is therefore x = 10y = 1.5 ≈ 5.3. Docent Optimus should
thus transfer 5.3 kkr to the savings account.
7. Solutions to the exercises 68

Q
3.7 (a) With the order quantity Q the average inventory cycle is and the
a
aK
average order cost per unit time is thus .
Q
(b) With the ordering level s (s non-negative) the inventory level will be s−al
right before delivery on average (those cases when the inventory becomes
empty are ignored). Right after delivery the average inventory level will
then be s − al + Q. Thereafter the inventory level is decreased linearly
(on average) wherefore the average inventory level is

(s − al) + (s − al + Q) Q
= s − al + .
2 2
Q
The holding cost per unit time is thus h · (s − al + ).
2
(c) With the density φ for the demand during the delivery lag the average
shortage becomes
Z ∞
b(s) = (z − s)φ(z)dz.
s
a
Thus, the shortage cost per unit time becomes · p · b(s).
Q
(d) Using the above we get that the total cost is

aK Q a
T C(s, Q) = + h · (s − al + ) + pb(s).
Q 2 Q
The function T C is convex in s and Q separately. For a fix s we can thus
determine the Q-min through differentiation.
r
2a(K + pb(s))
Q̂(s) = (∼Wilson)
h
(e) For the s-derivative we note that

Z ∞

b (s) = − φ(z)dz = −P (short. with order level s) = −Pbr (s).
s

We thus get the requirement that

ap hQ
h− · Pbr (s) = 0 or Pbr (s) = .
Q ap
3.8 Let B(r + x) denote the expected shortage
Z ∞
B(y) = (ξ − y)e−ξ/50 dξ/50 = {some calculations} = 50e−y/50 .
y

(a) The expected total cost becomes:



K + cx + 300B(r + x) if x > 0,
T C(x) =
300B(r + x) if x = 0.
7. Solutions to the exercises 69

(b) If we order x > 0, we should minimize T C :

0 := T C ′ (x̄) = c + 300B ′ (r + x̄) = c − 300e−(r+x̄)/50 ,


i.e.

x̄ = −50 ln(c/300) − r.
The optimal solution x̂ is then given by

x̄ if T C(x̄) < T C(0),
x̂ =
0 otherwise.

(c) x̄ = −50 ln(25/300) − 10 = 124 − 10 = 114.


T C(x̄) = 500+114·25+300B(124) = 500+2850+300·50e−124/50 = 4600.
T C(0) = 300 · 50e−10/50 = 12281.
Thus 114 tubes should be ordered.
(d) The tubes over 20 correspond to case (c).

1000

500

20

Thus, we solve for K = 500, c = 50, r = 10. If the solution x̄ ≥ 20


we choose x̂. If the solution x̄ < 20, we compare the total cost with
alternative (c). According to (b) x̄ = −50 ln(50/300) − 10 = 80. We
should thus choose the (c)-alternative x̂ = 114.

3.9 φD (ξ) = 1/µe−ξ/µ , FD (ξ) = 1 − e−ξ/µ

(a) Expected
Ry profit V =R R∞

= q 0 ξφD (ξ)dξ + q y yφD (ξ)dξ − c(y − x) − p y φD (ξ)dξ =
= qµ − (qµ + p)e−y/µ − c(y − x).
R∞
dV /dy = qyφD (y) − qyφD (y) · 1 + q y φD (ξ)dξ − c + pφD (y) =
= q(1 − FD (y)) − c + pφ(y).
d2 V /dy 2 = −qφD (y) + pφ′D (y) ≤ 0 ⇒ V concave!
dV (y ∗ )/dy = 0 ⇒ y ∗ = µ ln((qµ + p)/cµ)
 ∗
y if x ≤ y ∗
Optimal ordering policy is ŷ =
x otherwise.
(b) Let s solve V (s) = −K + V (y ∗ ), s ≤ y∗.
 ∗
y if x ≤ s
The optimal policy is given by ŷ =
x otherwise.
7. Solutions to the exercises 70

3.10 Let i be a non-negative integer. We then get



X i+1
X
C(i + 1) = c(i + 1) + p (d − i − 1)PD (d) + h (i + 1 − d)PD (d),
d=i+1 d=0

X i
X
C(i) = ci + p (d − i)PD (d) + h (i − d)PD (d).
d=i d=0

These relations give



X i
X
C(i + 1) − C(i) = c − p PD (d) + h PD (d)
d=i+1 d=0
i
! i
X X
= c−p 1− PD (d) +h PD (d)
d=0 d=0
= c − p + (p + h)FD (i)

Thus we have
p−c
C(i + 1) < C(i) if FD (i) < ,
p+h
p−c
C(i + 1) ≥ C(i) if FD (i) ≥ .
p+h

Since FD (i) is non-decreasing it follows that the non-negative number S that


minimizes C(y) is given by the smallest non-negative integer such that
p−c
FD (S) ≥ .
p+h
Let from here on S be that value. Let s be the largest integer that is less then
or equal to S such that
C(s) ≥ K + C(S).
As in the continuous case we now get: If x ≤ s, order up to S, otherwise do
not order.

3.11 (a) We get 


c(y − x) + p ifD > y,
C(y, D) =
c(y − x) ifD ≤ y.
With F (y) = E(C(y, D)) we get

F (y) = c(y − x) + p · P (D > y) = c(y − x) + pe−y .

Differentiating gives F ′ (y) = c − pe−y and F ′′ (y) = pe−y . Thus F is a


convex function and the minimizing y is given by S from
p
F ′ (S) = 0i.e.S = ln .
c
We therefore get the optimal policy as follows: If x ≤ S, order up to S,
otherwise do not order.
7. Solutions to the exercises 71

(b) Let S be as above, i.e. S = ln pc . Vi then get s as the smallest number for
which
p
cs + pe−s = cS + pe−S + K = c ln + c + K.
c
Since F is strictly convex this s is unique.
We thus get the optimal policy as follows: If x ≤ s, order up to S,
otherwise do not order.

4.1 (a) f and g are clearly seperable, so we can just check the properties of fk
and gk for k = 1, 2, 3.
Note that ∆fk (x) = −3 so it is decreasing and ∆gk (x) = k(x + 1)2 so it
is increasing.
Since ∆2 fk (x) = 0 and ∆2 gk (x) = k(x + 2)2 − k(x + 1)2 > 0 they are
both integer convex (for positive x).
(b) When we apply the marginal allocation algorithm we want to compare the
quotients −∆fk (x)/∆gk (x) and find the largest elements when k = 1, 2, 3
and x = 1, 2, 3, · · · . Here, −∆fk = 3 is a constant so it is easier to find
the smallest of the quotients 3∆gk (x)/(−∆fk (x)) = ∆gk (x).

n ∆g1 (n) ∆g2 (n) ∆g3 (n)


1 4 8 12
2 9 18 27
3 16 32 48
4 25 50 75

The smallest element is 4, so n(4) = (2, 1, 1) is the optimal allocation for


sum of x is 4 and f (n(4) ) = 48, g(n(4) ) = 10.
The smallest element is 8, so n(5) = (2, 2, 1) is the optimal allocation for
sum of x is 5 and f (n(5) ) = 45, g(n(5) ) = 18.
The smallest element is 9, so n(6) = (3, 2, 1) is the optimal allocation
forsum of x is 6 and f (n(6) ) = 42, g(n(6) ) = 27
The smallest element is 12, so n(7) = (3, 2, 2) is the optimal allocation for
sum of x is 7 and f (n(7) ) = 39, g(n(7) ) = 39
The smallest element is 16, so n(8) = (4, 2, 2) is the optimal allocation for
sum of x is 8 and f (n(8) ) = 36, g(n(8) ) = 55
(c) The optimal soultion is x̂ = (3, 2, 1) which corresponds to the efficient
point with g(x̂) = 27 and f (x̂) = 42.

4.2 We will apply the marginal allocation algorithm. First we identify the func-
tions f and g:
4
X X 4
X 4
X
cj
f (s) = = fj (sj ), g(s) = sj = gj (sj ),
sj + 1
j=1 j=1 j=1 j=1

c
j
where fj (sj ) = sj +1 and gj (sj ) = sj . Clearly, f is a decreasing separable
function. Since cj /(1 + x) is a convex function for x > 0, f is integer-convex.
Further, g is obviously an increasing integer-convex separable function. If the
7. Solutions to the exercises 72

functions fj (sj ) are evaluated for some reasonable values, the following table
is obtained:
k f1 (k) f2 (k) f3 (k) f4 (k)
18 30 48 66
0 0+1 = 18 0+1 = 30 0+1 = 48 0+1 = 66
18 30 48 66
1 1+1 = 9 1+1 = 15 1+1 = 24 1+1 = 33
18 30 48 66
2 2+1 = 6 2+1 = 10 2+1 = 16 2+1 = 22
18 30 48 66
3 3+1 = 4.5 3+1 = 7.5 3+1 = 12 3+1 = 16.5

Then it is easy to determine the marginal quotients −∆fj (k)/∆gj (k) = −∆fj (k):

k −∆f1 (k) −∆f2 (k) −∆f3 (k) −∆f 4(k)


0 9 15 24 33
1 3 5 8 11
2 1.5 2.5 4 5.5

We can order the elements in this table:


k −∆f1 (k)/1 −∆f 2(k)/1 −∆f3 (k)/1 −∆f4 (k)
0 5 3 2 1
1 6 4
2 7
(0) (0) (0) (0)
The marginal allocation algorithm starts with s(0) = (s1 , s2 , s3 , s4 ) =
(0, 0, 0, 0), and the generated efficient points are

s(1) = (0, 0, 0, 1),


s(2) = (0, 0, 1, 1),
s(3) = (0, 1, 1, 1),
s(4) = (0, 1, 1, 2),
s(5) = (1, 1, 1, 2),
s(6) = (1, 1, 2, 2),
s(7) = (1, 1, 2, 3).
Since g(s(7) ) = 7, it is well known from the theory of marginal allocation that
the point s(7) is an optimal solution to the problem: minimize f (s) subject to
g(s) ≤ 7. The 7 additional consultants should thus be allocated as 1, 1, 2, 3 to
the respective jobs, which means that the 11 consultants should be allocated
as 2, 2, 3, 4. 3

4.3 (a) We were given

n p1 (n) p2 (n) p3 (n)


1 20 30 40
2 15 20 25
TABLE 1 .
3 13 16 20
4 11 12 17
5 10 10 15
7. Solutions to the exercises 73

Then taking differences

n ∆p1 (n) ∆p2 (n) ∆p3 (n)


1 −5 −10 −15
TABLE 2 2 −2 −4 −5
3 −2 −4 −3
4 −1 −2 −2

and again

n ∆2 p1 (n) ∆2 p2 (n) ∆2 p3 (n)


1 3 6 10
TABLE 3
2 0 0 2
3 1 2 1

and since ∆2 pi (n) ≥ 0 for all i and n the function p is integer-convex. It is


seperable since it can be written as the sum of functions only depending
on one element in n each.
(b) From (a) we know that p is a seperable integer-convex function, from
the second table we see that it is also decreasing. Let f = p and g(n) =
n1 +n2 +n3 , where g is now increasing and integer-convex, since ∆2 g = 0.
We can now use the marginal allocation algorithm, making the table with
columns defined by −∆pi (n)/∆gi (n) = −∆pi (n) since ∆gi (n) = 1:

n −∆p1 (n) −∆p2 (n) −∆p3 (n)


1 5 10 15
TABLE MA 2 2 4 5
3 2 4 3
4 1 2 2

The largest element is 15, so n(4) = (1, 1, 2) is the optimal allocation for
4 researchers and p(n(4) ) = 75.
The largest element is 10, so n(5) = (1, 2, 2) is the optimal allocation for
5 researchers and p(n(5) ) = 75 − 10 = 65.
The largest element is 5, so n(6) = (2, 2, 2) or n(6) = (1, 2, 3) is the optimal
allocation for 6 researchers and p(n(8) ) = 65 − 5 = 60.
The largest element is again 5, so n(7) = (2, 2, 3) is the optimal allocation
for 7 researchers and p(n(7) ) = 60 − 5 = 55.
The largest element is 4, so n(8) = (2, 3, 3) is the optimal allocation for 8
researchers and p(n(8) ) = 55 − 4 = 51.

4.4 We will apply the marginal allocation algorithm for the functions f (x) =
P3 P3 xj +1
j=1 fj (xj ) and g(x) = j=1 gj (xj ), where fj (xj ) = pj and gj (xj ) =
x +1
wj (xj + 1). We have that ∆fj (xj ) = fj (xj + 1) − fj (xj ) = pj j (pj − 1) < 0
and ∆fj (xj + 1) − ∆fj (xj ) = pxj+1
j (pj − 1)2 2 > 0, so that fj is decreasing
and integer-convex. Further, gj (xj ) = gj (xj + 1) − gj (xj ) = wj > 0 and
∆gj (xj + 1) − ∆gj (xj ) = 0, so that gj is increasing and integer-convex. The
7. Solutions to the exercises 74

given data imply that ∆g1 (k) = 3, ∆g2 (k) = 1, ∆g3 (k) = 2, for all k = 0, 1, 2, ...
Further, the data imply that

k f1 (k) f2 (k) f3 (k)


0 0.3 0.1 0.2
1 0.09 0.01 0.04
2 0.027 0.001 0.008
3 0.0081 0.0001 0.0016

which implies that

k −∆f1 (k) −∆f2 (k) −∆f3 (k)


0 0.21 0.09 0.16
1 0.063 0.009 0.032
2 0.0189 0.0009 0.0064

so that
k −∆f1 (k)/∆g1 (k) −∆f2 (k)/∆g2 (k) −∆f3 (k)/∆g3 (k)
0 0.07 0.09 0.08
1 0.021 0.009 0.016
2 0.0063 0.0009 0.0032

We can order the elements in this last table, with the largest element first, etc.

k −∆f1 (k)/∆g1 (k) −∆f2 (k)/∆g2 (k) −∆f3 (k)/∆g3 (k)


0 3 1 2
1 4 6 5

The marginal allocation algorithm starts with x(0) = (0, 0, 0), and the gener-
ated efficient points and their weights become

x(0) = (0, 0, 0), g(x(1)) = 6,

x(1) = (0, 1, 0), g(x(1)) = 7,


x(2) = (0, 1, 1), g(x(2)) = 9,
x(3) = (1, 1, 1), g(x(3)) = 12,
x(4) = (2, 1, 1), g(x(4)) = 15.

5.1 We have V3 (x3 ) = u23 = (2x3 )2 = 4x23 , since u3 = 2x3 is needed for x4 = 0.
Recursion equation: Vk (xk ) = minuk {u2k + Vk+1 (2xk − uk )} for k = 2, 1, 0..
From above V3 (x3 ) = c3 x23 with c3 = 4.
Assume that Vk+1 (xk+1 ) = ck+1 x2k+1 for some fixed k ≥ 0. From the recursion
equation we have that Vk (xk ) = minuk {u2k +ck+1 (2xk −uk )2 }, and minimization
w.r.t. uk gives ûk = 2ck+1 x2k /(1 + ck+1 ) = ck x2k where ck = 4ck+1 /(1 + ck+1 ).
The assumed structure does actually hold.
The constants ck are determined from the recursion:
7. Solutions to the exercises 75

c3 = 4, c2 = 16/5, c1 = 64/21, c0 = 256/85.

This gives the optimal controls:

û3 = 2x3 , û3 = 8x2 /5, û1 = (32/21)x1 , û0 = (128/85)x0 ,

The total cost is V0 (85) = (256/85) ∗ 852 = 21760.


Check: 162 + 322 + 642 + 1282 = 21760.

5.2 (a) If s1 = 1 it is not necessary to produce anything more.


If s1 = 0 one unit has to be produced the last week.
Therefore, v1 (0) = c1 and v1 (1) = 0.
The recursive equation for vk (sk ), where k ≥ 2, is determined for the two
possible cases of sk . If sk = 1, the choice is to produce either 0 or 1 unit
the next week. This leads to

vk (1) = min {vk−1 (0), c1 + q + vk−1 (1)} .

If sk = 0, the choice is to produce either 1 or 2 units the next week. This


leads to
vk (0) = min {c1 + vk−1 (0), c2 + q + vk−1 (1)} .
(b) Use the recursion under the assumption c2 + q < 2c1 .
 

 

v2 (1) = min v1 (0), c1 + q + v1 (1) = c1 . (Prod.0)
| {z }
 | {z }

c1 c1 +q
 

 

v2 (0) = min c1 + v1 (0), c2 + q + v1 (1) = c2 + q. (Prod.2).
| {z }
 | {z }

2c1 c2 +q
 

 

v3 (1) = min v2 (0), c1 + q + v2 (1) = c1 . (Prod.0)
| {z }
 | {z }

c2 +q 2c1 +q
 

 

v3 (0) = min c1 + v2 (0), c + q + v2 (1) = c1 + c2 + q. (Prod.1 or 2).

| {z } |2 {z }

c2 +q c1 +q

 

 

v4 (1) = min v3 (0) , c1 + q + v3 (1) = c1 + c2 + q. (Prod.0)
 | {z }
 | {z }

c1 +c2 +q 2c1 +q
 

 

v4 (0) = min c1 + v3 (0), c + q + v3 (1) = 2(c2 + q). (Prod.2).
| {z }
 |2 {z }

2c1 +c2 +q 2c1 +q
7. Solutions to the exercises 76

 

 

v5 (1) = min v4 (0) , c + q + v4 (1) = 2(c2 + q). (Prod.0)
 | {z }
 |1 {z }

2(c2 +q) 2c1 +c2 +2q
 

 

v5 (0) = min c1 + v4 (0), c + q + v4 (1) = c1 +2(c2 +q). (Prod.1 or 2).
| {z }
 |2 {z }

c1 +2c2 +2q c1 +2c2 +2q

 

 

v6 (1) = min v5 (0) , c + q + v5 (1) = c1 + 2(c2 + q). (Prod.0)
 | {z }
 |1 {z }

c1 +2c2 +2q c1 +2c2 +3q
 

 

v6 (0) = min c1 + v5 (0), c + q + v5 (1) = 3(c2 + q). (Prod.2).

| {z } |2 {z }

2c1 +2c2 +2q 3(c2 +q)

(c) Use the recursion under the assumption c2 + q > 2c1 .


 

 

v2 (1) = min v1 (0), c1 + q + v1 (1) = c1 . (Prod.0)
| {z }
 | {z }

c1 c1 +q
 

 

v2 (0) = min c1 + v1 (0), c2 + q + v1 (1) = 2c1 . (Prod.1).
| {z }
 | {z }

2c1 c2 +q
 

 

v3 (1) = min v2 (0), c + q + v2 (1) = 2c1 . (Prod.0)
| {z }
 |1 {z }

2c1 2c1 +q
 

 

v3 (0) = min c1 + v2 (0), c2 + q + v2 (1) = 3c1 . (Prod.1).
| {z }
 | {z }

3c1 c1 +c2 +q
 

 

v4 (1) = min v3 (0), c + q + v3 (1) = 3c1 . (Prod.0)
| {z }
 |1 {z }

3c1 3c1 +q
 

 

v4 (0) = min c1 + v3 (0), c2 + q + v3 (1) = 4c1 . (Prod.1).
| {z }
 | {z }

4c1 2c1 +c2 +q
 

 

v5 (1) = min v4 (0), c + q + v4 (1) = 4c1 . (Prod.0)
| {z }
 |1 {z }

4c1 4c1 +q
7. Solutions to the exercises 77

 

 

v5 (0) = min c1 + v4 (0), c2 + q + v4 (1) = 5c1 . (Prod.1).
| {z }
 | {z }

5c1 3c1 +2c2 +q
 

 

v6 (1) = min v5 (0), c + q + v5 (1) = 5c1 . (Prod.0)
| {z }
 |1 {z }

5c1 5c1 +q
 

 

v6 (0) = min c1 + v5 (0), c2 + q + v5 (1) = 6c1 . (Prod.1).
| {z }
 | {z }

6c1 4c1 +c2 +q)

Conclusion: Given that s6 = 0 it is optimal in b) to order 2 units, then


0, then 2, then 0, then 2, then 0. and in c) to order 1 unit, then 1, then
1, then 1, then 1, then 1.

5.3 (a) Let xk = the balance of the account when k month remains of the year,
and uk = consumption during the month that starts when there are k
month left of the year.
The balance of the account evolves in time as xk−1 = ρ(xk − uk ), where
ρ = 1.01. It is given that x12 = 1 and x0 = 0.

Then V1 (x1 ) = c x1 , since all the remaining money has to be spent the
last month.
Recursion equation:

Vk (xk ) = max {c uk + Vk−1 (ρ(xk − uk )) | 0 ≤ uk ≤ x k } .
uk

(b)  

 √ 

p
V2 (x2 ) = max c u2 + c ρ(x2 − u2 ) | 0 ≤ uk ≤ x k .
u2 
| {z } 

ϕ2 (u2 )

Then
 √ 
c 1 ρ x2
ϕ′2 (û2 ) = √ −√ = 0 for û2 = ,
2 û2 x2 − û2 1+ρ

and since  √ 
c 1 ρ
ϕ′′2 (u2 ) =− √ + <0
4 u2 u2 (x2 − u2 )3/2
φ2 is concave and is maximized by û2 .
Then p p p
V2 (x2 ) = c û2 + c ρ(x2 − û2 ) = c (1 + ρ)x2 .

Assume that Vk (xk ) = c ak xk , where ak is a constant depending on k.
It holds for k = 1 and k = 2, with a1 = 1 and a2 = 1 + ρ.
7. Solutions to the exercises 78

Assume that it holds for k = 1, · · · , ℓ − 1, then


 

 √ 

p
Vℓ (xℓ ) = max c uℓ + c aℓ−1 ρ(xℓ − uℓ ) | 0 ≤ uℓ ≤ x ℓ .
uℓ 
| {z } 

ϕℓ (uℓ )

Maximizing ϕℓ (uℓ ) gives ûℓ = xℓ /(1 + ρaℓ−1 ), and then


p p p
Vℓ (xℓ ) = c ûℓ + c aℓ−1 ρ(xℓ − ûℓ ) = c (1 + ρaℓ−1 )xℓ .
So the assumption holds if we let aℓ = 1 + ρaℓ−1 .
ρk −1
Then a1 = 1, a2 = 1 + ρ, a3 = 1 + ρ + ρ2 , ...,ak = ρ−1 and the optimal
consumption is given by
xk ρ−1
ûk = = k xk
ak ρ −1
ρ2 (ρk−1 −1)
We know that xk−1 = ρ(xk − ûk ) = · · · = ρk −1
xk .
ρ2(12−k) (ρk −1)
Using x12 = 1, we get xk = ρ12 −1
, and

ρ2(12−k) (ρ − 1)
ûk = ,
ρ12 − 1
and the plan is to consume
ρ−1
û12 = ≈ 0.07885M SEK
ρ12 − 1
ρ2 (ρ − 1)
û11 = ≈ 0.08043M SEK
ρ12 − 1
ρ4 (ρ − 1)
û10 = 12 ≈ 0.08205M SEK
ρ −1
and so on, until
ρ22 (ρ − 1)
û1 = ≈ 0.09814M SEK
ρ12 − 1
5.6 Define xn = the number of acceptable reactor tanks still needed at the start
of production run n (n = 1, 2, 3). It is clear that xn = 0 or xn = 1.
With Sn (xn , Ln ) = total expected cost for the runs n, . . . , 3 if we start in xn ,
make the decision Ln (and acts optimal in the future) we get
 
0 if Ln = 0
Sn (1, Ln ) = + 1·L
3 if Ln > 0 | {z n}
| {z } Production cost
Setup cost

  Ln "   Ln #
1 1
+ ·Sbn+1 (1) + 1 − ·Sbn+1 (0)
2 2
| {z } | {z }
Prob. for Prob. for
total failure at least one
acceptable
tank
7. Solutions to the exercises 79

where Sbn (xn ) = minLn Sn (xn , Ln ). It is clear that Sbn (0) = 0, (n = 1, 2, 3).
This gives

    Ln !
0 if Ln = 0 1
Sbn (1) = min + Ln + Sbn+1 (1)
Ln 3 if Ln > 0 2

where we define Sb4 (1) = 16 (costs are given in 100-million kronor everywhere).
We then get

Sb3 (1) = min{0 + 0 + 16, 3 + 1 + 8, 3 + 2 + 4, 3 + 3 + 2, 3 + 4 + 1, 3 + 5 + 12 , . . .}


= min{16, 12, 9, 8, 8, 8.5, . . .} = 8

b 3 = 3 or 4.
for L

Sb2 (1) = min{0 + 0 + 8, 3 + 1 + 4, 3 + 2 + 2, 3 + 3 + 1, 3 + 4 + 12 , . . .}


= min{8, 8, 7, 7, 7.5, . . .} = 7

b 2 = 2 or 3.
for L

Sb1 (1) = min{0 + 0 + 7, 3 + 1 + 72 , 3 + 2 + 47 , 3 + 3 + 87 , 3 + 4 + 7


16 , . . .}
= min{7, 7.5, 6.75, 6.875, 7.44, . . .} = 6.75

b 1 = 2.
for L
That is: First make two tanks. If both fail: make two or three. If these fail:
make three or four.

5.7 (a) Let Vt,n (p) = the probability for a successful random walk starting in the
point (t, n) if pt,n = p and the other probabilities are chosen optimally.
It holds that

Vt,n (p) = pVt−1,n + (1 − p)Vt−1,n−1 .


Further

Vt,n = max Vt,n (p).


1−a≤p≤a

If we denote the optimal value of pt,n by ut,n it follows that

Vt,n = ut,n Vt−1,n + (1 − ut,n )Vt−1,n−1


(b) Vt,n and ut,n for the different points are
i. origin: V0,0 = 1 (u0,0 arbitrary)
ii. positive n-axis: V0,n = 0 (u0,n arbitrary)
iii. positive t-axis: Vt,0 = 0 (ut,0 arbitrary)
7. Solutions to the exercises 80

iv. in the triangle above the 45◦ line through the origin (n > t > 0):
Vt,n = 0 (ut,0 arbitrary)
v. on the 45◦ line through the origin (n = t > 0):
n
Y
Vn,n = max (1 − pk,k ) = {opt.} = (1 − (1 − a))n = an
p1,1 ,...,pn,n
k=1
un,n = 1 − a (as small as possible)

(c) In order to guess the general solution we first solve Vn+1,n for n = 1, 2, 3, 4.

V2,1 (p2,1 ) = p2,1 V1,1 +(1 − p2,1 ) V1,0 = p2,1 a


|{z} |{z}
=a =0

Maximize w.r.t. p2,1 =⇒ pb2,1 = u2,1 = a (as large as possible).


••
• V2,1 = a2 and u2,1 = a.

V3,2 (p3,2 ) = p3,2 V2,2 +(1 − p3,2 ) V2,1 = a2


|{z} |{z}
=a2 =a2

In this case p3,2 is arbitrary.


••
• V3,2 = a2 and u3,2 arbitrary.

V4,3 (p4,3 ) = p4,3 V3,3 +(1 − p4,3 ) V3,2 = p4,3 a2 (a − 1) + a2


|{z} |{z}
=a3 =a2

Maximize w.r.t. p4,3 gives (a − 1 < 0!) pb4,3 = u4,3 = 1 − a (as small as
possible).
••
• V4,3 = a3 (2 − a) and u4,3 = 1 − a.

V5,4 (p5,4 ) = p5,4 V4,4 +(1 − p5,4 ) V4,3 = p5,4 a3 2(a − 1) + a3 (2 − a)


|{z} |{z}
=a4 =a3 (2−a)

Maximize w.r.t. p5,4 gives (a − 1 < 0!) pb5,4 = u5,4 = 1 − a (as small as
possible).
••
• V5,4 = a4 (3 − 2a) and u5,4 = 1 − a (as small as possible).

Any point (n + 1, n): By using induction it is possible to prove that the


following guess is accurate:

Vn+1,n = an [(n − 1) − (n − 2)a], n≥3


un+1,n = 1 − a, n≥3
(d) The statement is false.

5.8 (a) If n = N , we have to stop the game and will then earn G(N, x) kronor.
Otherwise, if n < N , we may choose if we wish to stop the game or not.
If we stop the game we earn G(n, x) kronor. If we do not stop the game,
the earnings are a random variable Vn+1 (Xn+1 ), and thus the expected
7. Solutions to the exercises 81

winnings in step n + 1 are E[Vn+1 (Xn+1 )], if we act optimally. Thus we


get
VN (x) = G(N, x),
Vn (x) = max{G(n, x), E[Vn+1 (Xn+1 )]}, n = N − 1, N − 2, . . . , 1.
(b) Since we want to maximize the expected winnings it is optimal to stop if
G(n, x) > E[Vn+1 (Xn+1 )], and continue otherwise.
(c) If we use cn = E[Vn (Xn )], the DynP-equation for Vn can be written as
Vn (x) = max{G(n, x), cn+1 }. Thus,
cn = E[Vn (Xn )] = E[max{G(n, Xn ), cn+1 }]
Z Z
= cn+1 f (x)dx + G(n, x)f (x)dx
G(n,x)≤cn+1 G(n,x)>cn+1
Z
= cn+1 P (G(n, X) ≤ cn+1 ) + G(n, x)f (x)dx.
G(n,x)>cn+1

(d) Uniform distribution on [0, 1] and G(x, n) = x/n gives


  Z 1
Xn x
cn = cn+1 P ≤ cn+1 + dx
n ncn+1 n
 
1 2 1
= ncn+1 + ,
2 n
 
XN 1
CN = E = .
N 2N
For N = 4 we get c4 = 1/8 and c3 = 73/384, so it is optimal to stop if
the outcome of X2 is greater than 73/192 and continue otherwise.

5.9 The Bellman equation:




 V (x) = max {pV (x + u) + qV (x − u)}
 0≤u≤x
V (0) = 0 (7.1)



V (x) = 1, whenx ≥ N.

where V (x) should be interpreted as sup PX0 =x (XT ≥ N ).


u
Write the Bellman equation as

V = TV (+boundary conditions) (7.2)

Now let R(x) be given by


R(x) = PX0 =x (XT ≥ N ) given û = 1. (7.3)

From the uniqueness of the solution to 7.2 it follows that it suffices to show
that

R = T R, (7.4)
7. Solutions to the exercises 82

and we note that R trivially satisfies the boundary conditions in 7.1, i.e.

 R(n) = pR(n + 1) + qR(n − 1), 1≤n≤N −1

R(0) = 0 (7.5)


R(n) = 1, when n ≥ N.

Equation 7.5 is a second order homogeneous difference-equation with the gen-


eral solution

R(n) = Aλn1 + Bλn2 ,

where λ1 , λ2 are the zeros of the characteristic polynomial

p(z) = pλ2 − λ + q.

Since that p = 1 − q, and p > q it follows that the roots to p(z) = 0 are

q
λ1 = 1, λ2 = θ = ⇒ R(n) = A + Bθ n.
p

The boundary conditions R(0) = 0 and R(N ) = 1 give

1 − θn q
R(n) = N
where θ = . (7.6)
1−θ p
Now it suffices to show that

R(n) = max {pR(n + u) + qR(n − u)},


1≤u≤n

and that u = 1 gives the maximum.


We get

pR(n + u) + qR(n − u) = (1 − θ N )−1 {1 − θ n H(θ u )}


where
H(x) = px + qx−1 .
It is clear that H(θ 1 ) = 1. From this it follows that

• If u increases from 1 θ u decreases,


• which leads to an increase in H(θ u ),
• which decreases (1 − θ N )−1 {1 − θ N H(θ u )}.

Conclusion: The optimum is given by u = 1, and R satisfies the Bellman


equation.
7. Solutions to the exercises 83

5.10 If we define cn = sold quantity of gold, the formalised problem becomes (x =


remaining quantity of gold)
"N −1 #
X
max E cn pn β N −n−1
n=0

 xn+1 = xn − cn , 0 ≤ cn ≤ xn
s.t. x = A
 0
pn+1 = pn ωn

The DynP-equation in forward time becomes:

Vn (x, p) = sup {cpβ N −n−1 + E[Vn+1 (x − c, pωn+1 )]}, (7.1)


0≤c≤x
VN (x, p) = 0. (7.2)

Immediately we get

VN −1 (x, p) = sup {cp}


0≤c≤x

i.e.

VN −1 (x, p) = xp, cN −1 (x, p) = x.


b

From (7.1) it follows that

VN −2 (x, p) = sup {cpβ + E[(x − c)pω]},


0≤c≤x

where the time index is suppressed for ω. With m = E[ω] the problem becomes

max{cpβ + (x − c)pm}

s.t. 0 ≤ c ≤ x

There are two cases, dependent on the values of the parameters β and m.

I. m > β In this case we have

VN −2 (x, p) = xpm, cN −2 (x, p) = 0


b

Plug this in to (7.1)

VN −3 (x, p) = sup {cpβ 2 + E[(x − c)mpω]},


0≤c≤x

which gives the following problem


7. Solutions to the exercises 84

max{cp(β 2 − m2 ) + xpm2 }

s.t. 0 ≤ c ≤ x
Since m > β we get

VN −3 (x, p) = xpm2 , cN −3 (x, p) = 0


b

Now, make the guess

VN −k (x, p) = xpmk−1 , k = 0, . . . , N − 1

which is easily verified by induction.


••
• If m > β it holds that

VN −k (x, p) = xpmk−1 , k = 0, . . . , N − 1
cn (x, p)
b = 0, n = 0, . . . , N − 2
cN −1 (x, p) = x.
b
In other words: Save all the gold to day N .
II. m < β In this case we get

VN −2 (x, p) = xpβ, cN −2 (x, p) = x


b

Make therefore the guess that

VN −k (x, p) = xpβ k−1


cN −k (x, p) = x
b
which is easy to prove by induction.
••
• If m < β it holds that

VN −k (x, p) = xpβ k−1 , k = 0, . . . , N − 1


cn (x, p)
b = x, n = 0, . . . , N − 1
In other words: Sell all the gold immediately.

5.11 (a) Define the optimal value function Vn (k) as


" T −1
#
1X 2
Vt (k) = sup E xT − u given xt = k
ut ,...,uT −1 2 n=t n

1
Vt (k) = sup{− u2 + E[Vt+1 (k + Z)]},
u 2
i.e.
P −u um V
Vt (k) = supu {− 12 u2 + ∞ m=0 e m! t+1 (k + m)}
VT (k) = k, k = 0, 1, 2, . . .
7. Solutions to the exercises 85

(b) Start by computing one step backwards

VT −1 = supu≥0 {− 12 u2 + E[k + Z]} = {Z ∈ P o(m)}


= supu≥0 {− 12 u2 + k + u}
Maximizing w.r.t. u =⇒ u bT −1 (k) = 1.
•• 1
• VT −1 (k) = k + 2 u
bT −1 (k) = 1.
Obvious guess + induction =⇒
 n
VT −n (k) = k + 2
u
bt−n (k) = 1

5.12 Since the parade order is random one may consider the beauty of horse n
as the outcome of a random variable Yn , where it holds that Y1 , . . . , YN are
independent and equally distributed. Let

1 if Yn = sup{Y1 , . . . , Yn }
Zn =
0 otherwise

and denote the decision to accept a horse by 1 and the decision to reject a
horse by 0.
Define

Vn (z) = sup P (choosing the most beautiful horse|Zn = z)


u∈{0,1}

z = 0: Consider the two possible choices


u = 1: In this case P (choosing the most beautiful horse) = 0, since horse n
is uglier than at least one of the previous horses.
u = 0: In this case we have

Pu=0 (choosing the most beautiful horse|Zn = 0) =


= Vn+1 (1)P (Zn+1 = 1) + Vn+1 (0)P (Zn+1 = 0)

Define φn = E[Vn (Zn )] = Vn (1)P (Zn+1 = 1) + Vn (0)P (Zn+1 = 0)
••
• Vn (0) = φn+1

z = 1: Consider again the two choices.


u = 1: In this case we have that

Pu=1 (choosing the most beautiful horse|Zn = 1) =


= P (Yn = sup{Y1 , . . . , YN }|Yn = sup{Y1 , . . . , Yn })
u = 0: In this case it again holds that Vn (0) = φn+1

Calculations:
7. Solutions to the exercises 86

P (Zn = 1) = P (Yn = sup{Y1 , . . . , Yn }) = {sym.} = 1/n

P (Yn = sup{Y1 , . . . , YN }|Yn = sup{Y1 , . . . , Yn })


1
P (Yn = sup{Y1 , . . . , YN }) N n
= = 1 =
P (Yn = sup{Y1 , . . . , Yn }) n
N

DynP-equation: We get that


 u=0

 z }| {


 φn+1 , z=0
Vn (z) = n

 max{φn+1 , }, z=1

 | {z } |{z}
N
 u=0
u=1

Thus we need a recursion equation for φn . Plug in Zn in the DynP-equation


and take the expected value =⇒

1 n 1
φn = φn+1 · (1 − ) + max{φn+1 , } ·
n N n
i.e.

n−1 1 n
φn = φn+1 + max{φn+1 , }
n n N
The boundary condition for Vn is trivial

0, z=0
Vn (z) =
1, z=1

This gives that φN = 1/N .


Optimal strategy:

n < N : The following is optimal


i. If Zn = 0 reject.
ii. If Zn = 1 then
A. reject if φn+1 > n/N
B. accept if φn+1 < n/N
n = N : In this case one has to accept whether one wants to or not.

6.1 (a) Define the optimal value function according to


 Z T  
V (t, i) = inf E g(Xs , us )ds + G(XT ) ; Xt = i
u t
7. Solutions to the exercises 87

(b) Standard arguments =⇒

Z t+∆t 
V (t, i) = inf g(Xs , us )ds + E[V (t + ∆t, Xt+∆t )|Xt = i]
u t
X
{. . .} ≈ g(i, u)∆t + λij (u)V (t + ∆t, j)∆t +
j6=i
 
X
+ 1 − λij (u)∆t V (t + ∆t, i).
j6=i

Plug in V (t + ∆t, j) ≈ V (t, j) + Vt′ (t, j)∆t and ignore (∆t)2 -terms =⇒
n X o
Vt′ (t, i) + inf g(i, u) + λij V (t, j) = 0, i = 1, 2, . . . , M
u∈B

i.e. on vector form


(
V ′ (t) + inf {g(u) + H(u)V (t)} = 0
u
V (T ) = G
where
   
V (t, 1) g(1, u)
 ..   .. 
V (t) =  . , g(u) =  . .
V (t, M ) g(M, u)

6.2 Define the following states and decisions:


States: Good mood = 0
Bad mood = 1
Decisions: Do not buy dinner = 1
Buy dinner = 2
Transition matrices
 1 7
  7 1

P (1) = 8 8 , P (2) = 8 8
1 7 7 1
8 8 8 8

Expected costs
   
0 400 0 400
C= 7 = .
8 · 2000 400 1750 400

Initial policy R0 = [2, 2].


Value determination:

g = 400 + 78 v0 + 18 v1 − v0  g = 400
g = 400 + 78 v0 + 18 v1 − v1 =⇒ v0 = 0

v1 = 0 v1 = 0
7. Solutions to the exercises 88

Policy improvement:

State Decision
0 1 g̃ = 0 + 18 · 0 + 87 · 0 − 0 = 0 Min
2 g̃ = = 400
1 1 g̃ = 1750 + 81 · 0 + 78 · 0 − 0 = 1750
2 g̃ = = 400 Min

The new policy is R1 = [1, 2].


Value determination:

g = 0 + 81 v0 + 87 v1 − v0  g = 200
g = 400 + 78 v0 + 81 v1 − v1 =⇒ v0 = − 78 · 200

v1 = 0 v1 = 0

Policy improvement:

State Decision
0 1 g̃ = = 200 Min
2 g̃ = 400 + 78 (− 87 · 200) + 81 · 0 + 78 · 200 ≈ 428
1 1 g̃ = 1750 + 81 (− 87 · 200) + 81 · 0 − 0 ≈ 1721
2 g̃ = = 200 Min

The new policy is R2 = [1, 2] = R1 . Thus the optimal policy is to buy dinner
when Fluke is in a bad mood and otherwise not.

6.3 We have the following states and decisions:


States: Excellent = 1
Good = 2
Decent = 3
Bad = 4
Decision: Keep the machine = 0
Replace the machine = 1

(a) The transition matrices are

   
0.7 0.3 0 0 0.7 0.3 0 0
 0 0.7 0.3 0   0.7 0.3 0 0 
P (0) = 
 0
, P2 =  .
0 0.6 0.4   0.7 0.3 0 0 
0 0 0 1 0.7 0.3 0 0

(b) If the problem is treated as a maximum profit problem the following cost
structure is obtained

c10 = 100 c20 = 80 c30 = 50 c40 = 10


c11 = −100 c21 = −100 c31 = −100 c41 = −100
7. Solutions to the exercises 89

(c) Initial policy R0 = [0, 0, 1, 1].


Value determination:

g = 100 + 0.7v1 + 0.3v2 − v1 
 g = 60

 400
g = 80 + 0.7v2 + 0.3v3 − v2  v1 = 3
g = −100 + 0.7v1 + 0.3v2 − v3 =⇒ v2 = 0


g = −100 + 0.7v1 + 0.3v2 − v4 
 v3 = − 200
3

v2 = 0 v4 = − 200
3
Policy improvement:

State Decision
1 0 : 100 + 0.7 · 400
3 + 0.3 · 0 = 193.3 Max
1 : −100 + 0.7 · 4003 + 0.3 · 0 = −6.7

2 0 : 80 + 0.7 · 0 − 0.3 · 200


3 = 60 Max
400
1 : −100 + 0.7 · 3 + 0.3 · 0 = −6.7

3 0 : 50 − 0.6 · 200 200


3 − 0.4 · 3 = −16.6
400
1 : −100 + 0.7 · 3 + 0.3 · 0 = −6.7 Max

4 0 : 10 − 1 · 200
3 = −57.6
400
1 : −100 + 0.7 · 3 + 0.3 · 0 = −6.7 Max

The new policy is R1 = [0, 0, 1, 1] = R0 . The optimal strategy is thus


given by û(1) = 0, û(2) = 0, û(3) = 1 and û(4) = 1. The average income
for this strategy is 60 kr/week.

6.4 We have the following states and decisions:


States: Good season = 1
Bad season = 2
Decisions: Train moderately = 1
Train hard = 2
Transition matrices
   
0.5 0.5 0.8 0.2
P (1) = , P (2) = .
0.4 0.6 0.7 0.3
P
Expected costs ( cik = j Rij (k)pij (k) )
 
6 4
C= .
−3 −5

Initial policy R0 = [1, 1].


Value determination:

g + v1 = 6 + 0.5v1 + 0.5v2  g = 1
g + v2 = −3 + 0.4v1 + 0.6v2 =⇒ v1 = 10

v2 = 0 v2 = 0
7. Solutions to the exercises 90

Policy improvement:

State Decision
1 1 g̃ = 6 + 0.5 · 10 + 0.5 · 0 − 10 = 1
2 g̃ = 4 + 0.8 · 10 + 0.2 · 0 − 10 = 2 Max
2 1 g̃ = −3 + 0.4 · 10 + 0.6 · 0 − 0 = 1
2 g̃ = −5 + 0.7 · 10 + 0.3 · 0 − 0 = 2 Max

The new policy is R1 = [2, 2].


Value determination:

g + v1 = 4 + 0.8v1 + 0.2v2  g = 2
g + v2 = −5 + 0.7v1 + 0.3v2 =⇒ v1 = 10

v2 = 0 v2 = 0

Policy improvement: the same v1 and v2 as above =⇒


R2 = [2, 2] = R1 .
Optimal policy R̂ = [2, 2], i.e. train hard always!

6.5 We have the following states and decisions:


States: Broken = 0
Fine = 1
Decisions: Andersson = A
Bengtsson = B
Neither = C
Transition probabilities:

p00 (A) = 0.3 p01 (A) = 0.7 p10 (A) = 0.3 p11 (A) = 0.7
p00 (B) = 0.2 p01 (B) = 0.8 p10 (B) = 0.2 p11 (B) = 0.8
p10 (C) = 0.4 p11 (C) = 0.6

Expected cost cik the following day:

c0A = 2000 + 0.3 · 10000 = 5000


c0B = 3500 + 0.2 · 10000 = 5500
c0C not allowed
c1A = 500 + 0.3 · 10000 = 3500
c1B = 1600 + 0.2 · 10000 = 3600
c1C = 0 + 0.4 · 10000 = 4000

Start with the policy R0 = [B, A].


Value determination:
7. Solutions to the exercises 91


g + v0 = c0B + p00 (B) · v0 + p01 (B) · v1 
g + v1 = c1A + p10 (A) · v0 + p11 (A) · v1 =⇒

v1 = 0

 g = 4045.5
g + v0 = 5500 + 0.2 · v0 + 0
=⇒ v0 = 1818.2
g + 0 = 3500 + 0.3 · v0 + 0
v1 = 0

Policy improvement:

State Decision
0 A g̃ + v0 = c0A + p00 (A) · 1818.2 = 5545.4 Min
B g̃ + v0 = c0B + p00 (B) · 1818.2 = 5863.6
1 A g̃ + v1 = c0A + p10 (A) · 1818.2 = 4045
B g̃ + v1 = c0B + p10 (B) · 1818.2 = 3963 Min
C g̃ + v1 = c1C + p10 (C) · 1818.2 = 4727

The new policy is R1 = [A, B].


Value determination:

g + v0 = 5000 + 0.3v0 + 0.7v1  g = 3911
g + v1 = 3600 + 0.2v0 + 0.8v1 =⇒ v0 = 1555.5

v1 = 0 v1 = 0

Policy improvement:

State Decision
0 A g̃ + v0 = 5000 + 0.3 · 1555.5 = 5466.65 Min
B g̃ + v0 = 5500 + 0.2 · 1555.5 = 5811.1
1 A g̃ + v1 = 3500 + 0.3 · 1555.5 = 3966.65
B g̃ + v1 = 3600 + 0.2 · 1555.5 = 3911.1 Min
C g̃ + v1 = 4000 + 0.4 · 1555.5 = 4622.2

The new policy is R2 = [A, B] = R1 and is therefore the optimal policy. Thus,
one should hire Andersson if the system is broken and Bengtsson if it is fine
(contrary to what is being done at present).

6.6 Define states and decisions as


States: i = 0, 1, 2, 3 = units of water in the dam
Decisions: k = 1, 2, 3 = units of released water
We let k ≥ i mean that all water is released.
We get the following transition matrices
7. Solutions to the exercises 92

 
1/6 1/3 1/3 1/6
 1/6 1/3 1/3 1/6 
P (1) = 
 0 1/6 1/3 1/2


0 0 1/6 5/6
 
1/6 1/3 1/3 1/6
 1/6 1/3 1/3 1/6 
P (2) = 
 1/6 1/3 1/3 1/6


0 1/6 1/3 1/2
 
1/6 1/3 1/3 1/6
 1/6 1/3 1/3 1/6 
P (3) = 
 1/6 1/3 1/3 1/6


1/6 1/3 1/3 1/6

and the costs


 
3 3 3
 0 0 0 
C= 
 0 −1 −1  .
0 −1 −2

The LP becomes:

3 X
X 3
min cik yik
i=0 k=1
3 X
X 3
s.t. yik = 1
i=0 k=1
X3 3 X
X 3
yjk − yik pij (k) = 0 j = 0, . . . , 3
k=1 i=0 k=1
yik ≥ 0.

If ŷik is the solution to this problem then the policy is given by



ŷik 1 if decision k is made in state i
Dik = P3 =
k=1 ŷik
0 otherwise.

6.7 We have
N
X
A
g + viA = qiA + pA A
ij vi , i = 1, . . . , N
j=1
A
vN = 0
N
X
B
g + viB = qiB + pB B
ij vi , i = 1, . . . , N
j=1
B
vN = 0
7. Solutions to the exercises 93

where qiA = CiAi and pA


ij = pij (Ai ), similarly for B. Subtract the A-equations
from the B-equations =⇒

N
X N
X
gB − gA + viB − viA = qiB − qiA + pB B
ij vi − pA A
ij vi
j=1 j=1
X
∆g + ∆vi = γi + pB
ij ∆vi (†)

where PN PN
γi = qiB − qiA − A A B A
j=1 pij vi + j=1 pij vi ,

∆g = gB − gA ,
∆vi = viB − viA .

(†) is on the same form as the B-equations =⇒

N
X
∆g = πiB γi
i=1

where πi is the transition probability given by pB


ij .
The policy improvement step gives γi ≤ 0.
P
Since πi ≥ 0 it has to hold that ∆g = N B
i=1 πi γi ≤ 0 Q.E.D.

6.8 We define the states i = 0, 1, 2 and 3 that indicate the amount of water in the
beginning of the month. In the states i = 0 and i = 1 we have no possible
decisions. For i = 0 we have to purchase electricity to a cost of 3 kkr. Let
decision k be the number of units water that are released. The corresponding
costs are given in the following table:

State i Decision k Cik


0 0 3
1 1 0
2 1 0
2 2 -1
3 1 0
3 2 -1
3 3 -2

We start with the policy [0, 1, 1, 1]. In the value determination step in the
policy improvement algorithm we get the following system of equations:

V0 = 3 + 6/7(1/6 · V0 + 2/6 · V1 + 2/6 · V2 + 1/6 · V3 )


V1 = 0 + 6/7(1/6 · V0 + 2/6 · V1 + 2/6 · V2 + 1/6 · V3 )
V2 = 0 + 6/7( 1/6 · V1 + 2/6 · V2 + 1/2 · V3 )
V3 = 0 + 6/7( 1/6 · V2 + 5/6 · V3 )
7. Solutions to the exercises 94

The hint gives the solution (V0 , V1 , V2 , V3 ) = (90/23, 21/23, 6/233/23). Let’s
now try to improve the policy:
For state i = 2:
k = 1: gives Ṽ2 = 6/23.
k = 2: gives Ṽ2 = −1+6/7·(1/6·90/23+2/6·21/23+ 2/6·6/23+ 1/6·3/23) =
= −2/23. Better!

For state i = 3:
k = 1: gives Ṽ3 = 3/23.
k = 2: gives Ṽ3 = −1 + 6/7 · (1/6 · 21/23 + 2/6 · 6/23 + 1/2 · 3/23) = −17/23.
Better!
k = 3: gives Ṽ3 = −2+6/7·(1/6·90/23+2/6·21/23+ 2/6·6/23+ 1/6·3/23) =
= −25/23. Best!

New policy = [0, 1, 2, 3];


In this case we get the following system of equations in the value determination
step:

V0 = 3 + 6/7(1/6 · V0 + 2/6 · V1 + 2/6 · V2 + 1/6 · V3 )


V1 = 0 + 6/7(1/6 · V0 + 2/6 · V1 + 2/6 · V2 + 1/6 · V3 )
V2 = −1 + 6/7(1/6 · V0 + 2/6 · V1 + 2/6 · V2 + 1/6 · V3 )
V3 = −2 + 6/7(1/6 · V0 + 2/6 · V1 + 2/6 · V2 + 1/6 · V3 ).

This system has the solution (V0 , V1 , V2 , V3 ) = (2, −1, −2, −3) according to the
hint.
Let’s now try to improve the policy:
For state i = 2:
k = 1: gives Ṽ2 = 0 + 6/7 · (1/6 · (−1) + 2/6 · (−2) + 3.6 · (−3)) = −2.
k = 2: gives Ṽ2 = −2. Equally good!

For state i = 3:
k = 1: gives Ṽ3 = 0 + 6/7 · (1/6 · (−2) + 5/6 · (−3)) = −17/7
k = 2: gives Ṽ3 = −1 + 6/7 · (1/6 · (−1) + 2/6 · (−2) + 1/2 · (−3)) = −3. Better!
k = 3: gives Ṽ3 = −3. Equally good!
We have obtained the same policy as before, i.e. [0, 1, 2, 3], but the policies
[0, 1, 1, 3] and [0, 1, 2, 2] are just as good.

6.9 Introduce the state flop (1) and success (2). Introduce further the decision
nothing (1), advertise (2) and development (3). To make decision 1 is free of
charge, decision 2 costs 40 Mkr and decision 3 costs 60 Mkr. The text gives
     
0.9 0.1 0.8 0.2 0.5 0.5
P (1) = , P (2) = and P (3) = .
0.4 0.6 0.2 0.8 0.4 0.6
7. Solutions to the exercises 95

The costs are

C11 = 0 + 0.9 · 20 + 0.1 · (−100) = 8


C12 = 40 + 0.8 · 20 + 0.2 · (−100) = 36
C13 = 60 + 0.5 · 20 + 0.5 · (−100) = 20
C21 = 0 + 0.4 · 20 + 0.6 · (−100) = −52
C22 = 40 + 0.2 · 20 + 0.8 · (−100) = −36
C23 = 60 + 0.4 · 20 + 0.6 · (−100) = 8.

(a) Initial policy R = (1, 1). The value determination gives the following
equations
g = 8 + 0.9V1 + 0.1V2 − V1 ,
g = −52 + 0.4V1 + 0.6V2 − V2 .
With V2 = 0 we get g = −4 and V1 = 120. The policy improvement step
gives
i=1 k =1 g̃ = −4,
k =2 g̃ = 36 + 0.8 · 120 − 120 = 12,
k =3 g̃ = 20 + 0.5 · 120 − 120 = −40, ← smallest
i=2 k =1 g̃ = −4,
k =2 g̃ = −36 + 0.2 · 120 = −12, ← smallest
k =3 g̃ = 8 + 0.4 · 120 = 56.
The new policy is R = (3, 2). The value determination gives the following
equations
g = 20 + 0.5V1 + 0.5V2 − V1 ,
g = −36 + 0.2V1 + 0.8V2 − V2 .
With V2 = 0 we get g = −20 and V1 = 80. The policy improvement step
gives
i=1 k =1 g̃ = 8 + 0.9 · 80 − 80 = 0,
k =2 g̃ = 36 + 0.8 · 80 − 80 = 20,
k =3 g̃ = −20, ← smallest
i=2 k =1 g̃ = −52 + 0.4 · 80 = −20, ← smallest
k =2 g̃ = −20, ← smallest
k =3 g̃ = 8 + 0.4 · 80 = 40.
The new policy is R = (3, 2), which is the same as before. Thus we are
done. (Note that also R = (3, 1) is optimal.)
The optimal policy gives 20 Mkr per year in expected profit. The policy
is to develop if it is a flop and do nothing or advertise if if is a success.
(b) We get the following LP
min 8y11 + 36y12 + 20y13 − 52y21 − 36y22 + 8y23
s.t. y11 + y12 + y13 + y21 + y22 + y23 = 1,
y11 + y12 + y13 − 0.9y11 − 0.8y12 − 0.5y13 − 0.4y21 − 0.2y22 − 0.4y23 = 0,
y21 + y22 + y23 − 0.1y11 − 0.2y12 − 0.5y13 − 0.6y21 − 0.8y22 − 0.6y23 = 0,
yik ≥ 0, i = 1, 2, k = 1, 2, 3.
7. Solutions to the exercises 96

Simplification gives
min 8y11 + 36y12 + 20y13 − 52y21 − 36y22 + 8y23
s.t. y11 + y12 + y13 + y21 + y22 + y23 = 1,
0.1y11 + 0.2y12 + 0.5y13 − 0.4y21 − 0.2y22 − 0.4y23 = 0,
−0.1y11 − 0.2y12 − 0.5y13 + 0.4y21 + 0.2y22 + 0.4y23 = 0,
yik ≥ 0, i = 1, 2, k = 1, 2, 3.
Note that one of the two last equalities is redundant and we can eliminate
the last one. From (a) we have R̂ = (3, 2), i.e. there is an optimal solution
where only y13 and y23 are positive. In order to satisfy the constraints it
is required that
y13 + y22 = 1,
0.5y13 − 0.2y22 = 0,
i.e. y13 = 27 and y22 = 57 . (Alternatively the policy R̂ = (3, 1) gives the
solution y13 = 49 and y21 = 95 , the other yik = 0.)
(c) We start with R = (1, 1). The value determination gives the following
equations
V1 = 8 + 0.8(0.9V1 + 0.1V2 )
V2 = −52 + 0.8(0.4V1 + 0.6V2 ).
The solution is V1 = 0, V2 = −100. The policy improvement step gives
i=1 k =1 Ṽ1 = 0,
k =2 Ṽ1 = 36 + 0.8(0.8 · 0 − 0.2 · 100 = 20,
k =3 Ṽ1 = 20 + 0.8(0.5 · 0 − 0.5 · 100 = −20, ← smallest
i=2 k =1 Ṽ2 = −100, ← smallest
k =2 Ṽ2 = −36 + 0.8(0.2 · 0 − 0.8 · 100) = −100, ← smallest
k =3 Ṽ2 = 8 + 0.8(0.4 · 0 − 0.6 · 100) = −40.
The new policy is R = (3, 1). (As an alternative R = (3, 2) may be
chosen.) The value determination gives the following equations
V1 = 20 + 0.8(0.5V1 + 0.5V2 )
V2 = −52 + 0.8(0.4V1 + 0.6V2 ).
The solution is V1 = − 1300 3100
23 , V2 = − 23 . The policy improvement step
gives
i=1 k =1 Ṽ1 = 8 − 0.8(0.9 · 1300 3100 11
23 + 0.1 · 23 ) = −43 23 ,
k =2 Ṽ1 = 36 − 0.8(0.8 · 1300 3100 17
23 + 0.2 · 23 = −21 23 ,
1300 12
k =3 Ṽ1 = − 23 = −56 23 , ← smallest
i=2 k =1 Ṽ2 = − 3100 18
23 = −134 23 , ← smallest
k =2 Ṽ2 = −36 − 0.8(0.2 · 1300 3100 7
23 + 0.8 · 23 ) = −131 23
k =3 Ṽ2 = 8 − 0.8(0.4 · 1300 3100 18
23 + 0.6 · 23 ) = −74 23 .

The new policy is R = (3, 1), which is the same as before. This means
that R̂ = (3, 1) is optimal. If we initially have a success the total expected
profit will then be 134.8 Mkr.

You might also like