Professional Documents
Culture Documents
Lectures On Dynamic Programming, Optimization and Control PDF
Lectures On Dynamic Programming, Optimization and Control PDF
aed te types 6 weakened
tor for cach = ere ent aw wach that (63) hele hen i placed by
ro Supe w i dase Ly some yy, By repented ulin of (63) it
iat we have
oe) <2 [Se
seen
Dive this by ¢ ad et + to obi
oe
Where te inl term on the rat Band see snply the average-cost under pole x
Minvising the right hae cde over z ges the oui, Te la for oplatel by ©
ta
Os -¥ tig bey
‘Theorem 6.2 Suppuse ere exists constant \ and dened fnetion ¢ setifing
(62). Then Ase te mol atrag-cnt ad te optimal satunary poe the oe
‘aoe he pti oh ight hand wef (62)
Prout. anion (2 impo tha (63) hoe with quality when oe taba 0 she
‘tatouey pale Ua chooses the opting wou the gt hase of (6.2). Thus =
‘epee i he imal erage BP
“The averag-cost optimal policy sound simply by looking oe abounded sation
to (62) Notice that Md iw slution of (52) thon 2 4 conan), becuase the
(a constant) wil cance from both ses of (62). "hs is undeteried up 20
ive coment le etching fr «schon to (62) we x hero ck my tee,
‘Fy Bsn bitrate 3) =
6.2. Example: admission control at a queue
ch day a consltnt i presented with che opportunity to tle on new job The
Jobs at ndepeneatydsrbuted oven ponble ype and on a sven day the fered
ype wi robliy wif = Devote dle of ype fey up ep,
(ne he bs accept jb he may ces no othe bul that ob complete. The
probabity hat no of peaks edna (0 ot, = EW be
Sh he coms eee”
Solution, Let 0 acd {deste te state in which be fee wo accep job, a
hich hee epg pon mb of fps epee Thaw (6:2) 5
2440) = Samant sa)
Ard) = Camda nite +
Taking (0) = 0, Ue have slat $i) = Hy — Np al ce
d= Semele eV
there is root. sy A", and this isthe minal average-teward, ‘The opsial paioy
tales be fem ace only jos for ick p> Ne6.3 Value iteration hounds
Value erton inthe aveagecest ce wed pon the en that #2) —Fy-a2)
yoxiates the minanal average cnt for Ie
‘Theorenn 6
Define
se mle) — Beales
apatite) Raaledhs (6a)
Then mg SX My, where the minimal average cot
root (stare
Maroy pan J Thee
appar tbs the fs stp of « shorn opal pay lows
BiG) = Fae) +10) ~ Fee) = es fC0)) + BF) [20 = 2,80 = Fe)
Fyala) +m, S cea} + BIR-ale |
for ll xu, Appling Theorem 6. with ¢= Fos and X’
Tound 12 Af eesti a iar ex
“This juries the follwing value iteration algorithm. At termination the algo
‘idk poten tna ply that within [OOS of ope
(0) Se Fila) =o
(0) Coma F om
5.9 =a,
implies me ©. The
fa) — min) + BF) lae— 8 8 — a)
(2) Compute and
se gor ep (3)
6A Policy
Ptigy imprest im civ mite! of impor satoonry ein.
provement
Policy improvement in the averagecont cane
In the average cot at policy improves lors ean be bas on the following
‘oteriscons Supp Unt ply F=f, We have sat A, 2 wonton to
A ole) =e, Foy) + Blea) Lao =2,0 = Sa)
a suppose fr se poi
de ge)
E osfalen) | Blt) Lay vas = fells wa
with tre nna fr sre x. ‘Thes lowing he ines of roa in Tore 6.
eae a which (0.5) ke x tne (2) ae pin, Tie tien
"he llovi policy improvement sleorithen
(0) Cans a artary stationary py oS 41
(pe fy pley Fon = fry eormive6, to save
AA da) = ea fesCa) Blols) Le =e = fee)
es [Eons
Ths gives wo ear pty eal eat ue sn (0:2).
(2) Now dtarnine the pai m= "om
a fle) # foe) [29 = 20 = Hl)
nga) +2160) Ian = 389
oo
taking $2) = f-u(2) whenever this posible. By applition of Theorem 6.1,
ths yes 8 srt inprowemnent wherever possible, I y= sy then the Meo
teeinater and sy Hopi, Oster, muta sop (2) with #0 2
beth the action and tate apace ar Fite shen thre are ony» Site mur
of peal stationary poles nt so the poley Improvement agordin wild oo
‘pt stsionney py in tele may erations By contrat, the ale erat
“Mboithn cum only cae move und mere aces appreciation of
1 ho cao of strc dscountng, the fling tbzorem plays the rol of There 61
The poet © sna, hy reveal satin of (0) ta eal
‘Theorem Suppose Hers crits bounded fiction onh th fol = on
Fe) le) + 881010) [non 00 =
"hen G You might also like