Professional Documents
Culture Documents
Fundamentals of Reinforcement Learning
Fundamentals of Reinforcement Learning
Learning
December 9, 2013 - Techniques of AI
Yann-Michaël De Hauwere - ydehauwe@vub.ac.be
T. Mitchell
Machine Learning, chapter 13
McGraw Hill, 1997
Control learning
I Robot learning to dock on battery charger
I Learning to choose actions to optimize factory output
I Learning to play Backgammon/other games
rt+1rt+1
rt+1 ∈ R st+1 Environment
Environment
st+1
I Observes resulting state st+1
... r t +1 r t +2 r t +3 ...
st s t +1 s t +2 s t +3
at at +1 at +2 at +3
14
rt+1
I a Reward function st+1 Environment
... r t +1 r t +2 r t +3 ...
st s t +1 s t +2 s t +3
at at +1 at +2 at +3
14
X∞
π
V (s) = Eπ (Rt |st = s) = Eπ ( γ k rt+k+1 |st = s)
k=0
a∈A(s) s0 ∈S
eQt (s,a)/τ
πt (s, a) = P Qt (s,a0 )/τ
a0 ∈A e
I Reward shaping
I Incorporate domain knowledge to provide additional rewards
during an episode
I Guide the agent to learn faster
I (Optimal) policies preserved given a potential-based shaping
function [Ng99]
I Function approximation
I So far we have used a tabular notation for value functions
I For large state and actions spaces this approach becomes
intractable
I Function approximators can be used to generalize over large or
even continuous state and action spaces
http://wilma.vub.ac.be:3000