Welcome to Scribd!

Skip carousel

Ethemalpaydin Chap16

Uploaded by

reshma khemchandani

0% found this document useful (0 votes)

6 views21 pages

Original Title

ethemalpaydin-chap16

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

6 views21 pages

Ethemalpaydin Chap16

Uploaded by

reshma khemchandani

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 21

Search inside document

Lecture Slides for

INTRODUCTION TO

Machine Learning
ETHEM ALPAYDIN
© The MIT Press, 2004
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml
CHAPTER 16:

Reinforcement
Learning
Introduction
Game-playing: Sequence of moves to win a game
Robot in a maze: Sequence of actions to find a goal
Agent has a state in an environment, takes an
action and sometimes receives reward and the state
changes
Credit-assignment
Learn a policy

3
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
Single State: K-armed Bandit
Among K levers, choose
the one that pays best
Q(a): value of action a
Reward is ra
Set Q(a) = ra
Choose a* if
Q(a*)=maxa Q(a)

Rewards stochastic (keep an expected reward):

4
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
Elements of RL (Markov Decision
Processes)
st : State of agent at time t
at: Action taken at time t
In st, action at is taken, clock ticks and reward rt+1 is
received and state changes to st+1
Next state prob: P (st+1 | st , at )
Reward prob: p (rt+1 | st , at )
Initial state(s), goal state(s)
Episode (trial) of actions from initial state to goal
(Sutton and Barto, 1998; Kaelbling et al., 1996)

5
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
Policy and Cumulative Reward
Policy,
Value of a policy,
Finite-horizon:

Infinite horizon:

6
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
Bellman’s equation

Value of at in st

7
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
Model-Based Learning
Environment, P (st+1 | st , at ), p (rt+1 | st , at ), is known
There is no need for exploration
Can be solved using dynamic programming
Solve for

Optimal policy

8
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
Value Iteration

10
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
Temporal Difference Learning
Environment, P (st+1 | st , at ), p (rt+1 | st , at ), is not
known; model-free learning
There is need for exploration to sample from
P (st+1 | st , at ) and p (rt+1 | st , at )
Use the reward received in the next time step to
update the value of current state (action)
The temporal difference between the value of the
current action and the value discounted from the
next state

11
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
Exploration Strategies
ε-greedy: With pr ε,choose one action at random
uniformly; and choose the best action with pr 1-ε
Probabilistic:

Move smoothly from exploration/exploitation.

Decrease ε
Annealing

12
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
Deterministic Rewards and
Actions

Deterministic: single possible reward and next state

used as an update rule (backup)

Starting at zero, Q values increase, never decrease

Consider the value of action marked by ‘*’:

If path A is seen first, Q(*)=0.9*max(0,81)=73
Then B is seen, Q(*)=0.9*max(100,81)=90
Or,
If path B is seen first, Q(*)=0.9*max(100,0)=90
Then A is seen, Q(*)=0.9*max(100,81)=90
Q values increase but never decrease
14
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
Nondeterministic Rewards and
Actions
When next states and rewards are nondeterministic
(there is an opponent or randomness in the
environment), we keep averages (expected values)
instead as assignments
Q-learning (Watkins and Dayan, 1992):

backup
Off-policy vs on-policy (Sarsa)
Learning V (TD-learning: Sutton, 1988)

17
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
Eligibility Traces
Keep a record of previously visited states (actions)

19
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
Generalization
Tabular: Q (s , a) or V (s) stored in a table
Regressor: Use a learner to estimate Q (s , a) or V (s)

Eligibility

20
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)
Partially Observable States
The agent does not know its state but receives an
observation which can be used to infer
a belief about states
Partially observable
MDP

Reading Fact & Opinion sp2023
Document3 pages
Reading Fact & Opinion sp2023
Gaby Ramirez
No ratings yet
Chrysanthemum Lesson
Document3 pages
Chrysanthemum Lesson
api-444336285
No ratings yet
Planning, Implementing and Evaluating
Document20 pages
Planning, Implementing and Evaluating
mei rose puyat
83% (18)
ML and Its Application
Document13 pages
ML and Its Application
madhuri.anwat
No ratings yet
I2ml Chap1 v1 1
Document19 pages
I2ml Chap1 v1 1
Divya Sharma
No ratings yet
Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010
Document28 pages
Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010
Abhijay Patne
No ratings yet
1.1 Class-Ml
Document18 pages
1.1 Class-Ml
nmmishra77
No ratings yet
Machine Learning Chapter 1
Document24 pages
Machine Learning Chapter 1
kirthu priya
No ratings yet
Slides 11
Document39 pages
Slides 11
Akshaya Ashok
No ratings yet
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
Document42 pages
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
Adhityo Priyambodo
No ratings yet
Chapter 05 - Sharda 11e Full Accessible PPT 05
Document31 pages
Chapter 05 - Sharda 11e Full Accessible PPT 05
rorog345g
No ratings yet
G Unit en ML Introduction
Document33 pages
G Unit en ML Introduction
Shantam
No ratings yet
5.4-Reinforcement Learning-Part2-Learning-Algorithms
Document15 pages
5.4-Reinforcement Learning-Part2-Learning-Algorithms
polinati.vinesh2023
No ratings yet
Continuous-Time Signal Processing (ET 2004) Chapter 7: Introduction To Linear Feedback Systems
Document58 pages
Continuous-Time Signal Processing (ET 2004) Chapter 7: Introduction To Linear Feedback Systems
Alif
No ratings yet
Lecture 2 - IV Part 1
Document20 pages
Lecture 2 - IV Part 1
gks5kc8bhd
No ratings yet
MECN5015A - Chapter 4 - Lecture Slides 6 April
Document61 pages
MECN5015A - Chapter 4 - Lecture Slides 6 April
Gulain Mayombo
No ratings yet
Kernel Adaptive Filtering PDF
Document124 pages
Kernel Adaptive Filtering PDF
georgez111
No ratings yet
ECE613-Lecture 4: Multilayer Feedforward Neural Networks-Applications
Document4 pages
ECE613-Lecture 4: Multilayer Feedforward Neural Networks-Applications
Hisham Sliman
No ratings yet
Unit 3 Regression
Document91 pages
Unit 3 Regression
Paidi Ashritha
No ratings yet
Chapter 8 - Steady State Error
Document16 pages
Chapter 8 - Steady State Error
Alif Rosli
No ratings yet
Class 2 - Mathematical Modeling Using Transfer Function Approach
Document14 pages
Class 2 - Mathematical Modeling Using Transfer Function Approach
api-26676616
No ratings yet
Continuous-Time Signal Processing (ET 2004) Chapter 7: Introduction To Linear Feedback Systems
Document59 pages
Continuous-Time Signal Processing (ET 2004) Chapter 7: Introduction To Linear Feedback Systems
MuhammadTaufiqRafiandi
No ratings yet
Lec 21
Document28 pages
Lec 21
dr.compilationyt2
No ratings yet
Reinforcement Learning - Introduction
Document19 pages
Reinforcement Learning - Introduction
BONGA GIBSON BALENI
No ratings yet
A Comparative Analysis of Fuzzy Based Hybrid Anfis Controller For Stabilization and Control of Non-Linear Systems
Document11 pages
A Comparative Analysis of Fuzzy Based Hybrid Anfis Controller For Stabilization and Control of Non-Linear Systems
Anonymous KhhapQJVYt
No ratings yet
CEC453 Machine Learning
Document168 pages
CEC453 Machine Learning
munjangchick
No ratings yet
ML Unit-2 Material Add-On
Document82 pages
ML Unit-2 Material Add-On
afreed khan
No ratings yet
Pid Controller
Document38 pages
Pid Controller
CHIRAG
No ratings yet
Interactive Session4 System ID With Solution
Document18 pages
Interactive Session4 System ID With Solution
Vincent GHEROLD
No ratings yet
ME 325 ControlSystems Lecture 1 2023 PDF
Document59 pages
ME 325 ControlSystems Lecture 1 2023 PDF
Nissy Chinthapalli
No ratings yet
At The End of Today's Class You Will Be Able To
Document13 pages
At The End of Today's Class You Will Be Able To
Joel Patiño
No ratings yet
Lecture 1: Introduction To Reinforcement Learning: David Silver
Document46 pages
Lecture 1: Introduction To Reinforcement Learning: David Silver
Rajesh Punia
No ratings yet
CH 6
Document78 pages
CH 6
Sara mas
No ratings yet
LSSGB (Simplilearn, 2014) - Lesson - 3. Measure
Document121 pages
LSSGB (Simplilearn, 2014) - Lesson - 3. Measure
taghavi1347
No ratings yet
Principe Icassp2011 Klms
Document124 pages
Principe Icassp2011 Klms
Алексей Пащенко
No ratings yet
Multilayer Perceptron (MLP) : The Backpropagation (BP) Algorithm
Document26 pages
Multilayer Perceptron (MLP) : The Backpropagation (BP) Algorithm
Alina Sirghie
No ratings yet
Operations Research EM - 505 EM 505: L 1 I D I Lecture - 1: Introduction
Document21 pages
Operations Research EM - 505 EM 505: L 1 I D I Lecture - 1: Introduction
Sana Zaheer
No ratings yet
Chapter Seven Steady-State Errors
Document27 pages
Chapter Seven Steady-State Errors
Arazw Abdulrahman
No ratings yet
ML Lec. 01
Document17 pages
ML Lec. 01
voroje5972
No ratings yet
Chapter 20 Recursion
Document49 pages
Chapter 20 Recursion
Hoàng Vũ
No ratings yet
Research Paper
Document10 pages
Research Paper
李默然
No ratings yet
Reinforcement Learning, Crawling Robot: Faculty of Sciences and Techniques Béni-Mellal
Document5 pages
Reinforcement Learning, Crawling Robot: Faculty of Sciences and Techniques Béni-Mellal
MohamedLahouaoui
No ratings yet
NHA2414 - Lecture - 1-2 Control System Introduction
Document23 pages
NHA2414 - Lecture - 1-2 Control System Introduction
priom ahmed
No ratings yet
Intelligent Systems
Document20 pages
Intelligent Systems
mj
No ratings yet
Lab 06 MATLAB MEEN201101082
Document13 pages
Lab 06 MATLAB MEEN201101082
Umer Ashfaq
No ratings yet
10.1109 Lars-Sbr.2015.41 Apbp
Document6 pages
10.1109 Lars-Sbr.2015.41 Apbp
narvan.m31
No ratings yet
Mid 1 Slide 2
Document82 pages
Mid 1 Slide 2
Inzamamul Haque
No ratings yet
Modelbuilding
Document52 pages
Modelbuilding
Mohamed Med
No ratings yet
Monte Carlo Learning
Document14 pages
Monte Carlo Learning
Sivasathiya G
No ratings yet
Experiences and Lessons in Developing Machine Learning and Data Mining Software
Document44 pages
Experiences and Lessons in Developing Machine Learning and Data Mining Software
panda
No ratings yet
ITC LAB Outline Spring 2021
Document6 pages
ITC LAB Outline Spring 2021
zain arshad
No ratings yet
Machine Learning Techniques Quantum
Document161 pages
Machine Learning Techniques Quantum
Anup Yadav
No ratings yet
MECH466 L3 ODESolution
Document6 pages
MECH466 L3 ODESolution
Anonymous WkbmWCa8M
No ratings yet
Reinforcement Learning
Document29 pages
Reinforcement Learning
Swathi Prasad
No ratings yet
Introduction To Machine Learning: Ning Xiong Mälardalen University
Document23 pages
Introduction To Machine Learning: Ning Xiong Mälardalen University
Nicholas Logan
No ratings yet
Week 7 Objects and Classes Contintued
Document17 pages
Week 7 Objects and Classes Contintued
Sana Khattak
No ratings yet
Using Additive Expert Ensembles To Cope With Concept Drift
Document8 pages
Using Additive Expert Ensembles To Cope With Concept Drift
arkamyamina29
No ratings yet
He Delving Deep Into
Document9 pages
He Delving Deep Into
doktorgoldfisch
No ratings yet
CS 228: Logic in Computer Science: Krishna. S
Document46 pages
CS 228: Logic in Computer Science: Krishna. S
Invisible Bro
No ratings yet
M01 Machine Learning
Document25 pages
M01 Machine Learning
anjibalaji52
No ratings yet
CH 04 The Efficiency of Algorithms
Document20 pages
CH 04 The Efficiency of Algorithms
Tarik Elamsy
No ratings yet
FDP Day 1 Regression V 1
Document29 pages
FDP Day 1 Regression V 1
Ajay Sharma
No ratings yet
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
From Everand
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
Mohammed Chadli
No ratings yet
CBSE Class 7 Science - Changes & Reactions
Document1 page
CBSE Class 7 Science - Changes & Reactions
reshma khemchandani
No ratings yet
What We Learned Last Time: 1. Intelligence Is The Computational Part of The Ability To Achieve Goals
Document32 pages
What We Learned Last Time: 1. Intelligence Is The Computational Part of The Ability To Achieve Goals
reshma khemchandani
No ratings yet
Chapter 3: The Reinforcement Learning Problem (Markov Decision Processes, or MDPS)
Document27 pages
Chapter 3: The Reinforcement Learning Problem (Markov Decision Processes, or MDPS)
maryamsaleem430
No ratings yet
Unified View: Multi-Step Bootstrapping
Document18 pages
Unified View: Multi-Step Bootstrapping
reshma khemchandani
No ratings yet
Chapter 6: Temporal Difference Learning: Objectives of This Chapter
Document49 pages
Chapter 6: Temporal Difference Learning: Objectives of This Chapter
reshma khemchandani
No ratings yet
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
Document8 pages
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
reshma khemchandani
No ratings yet
601 sp09 Midterm Solutions
Document14 pages
601 sp09 Midterm Solutions
reshma khemchandani
No ratings yet
Actividad de Apollo Ingles 1
Document9 pages
Actividad de Apollo Ingles 1
Angel Machado
No ratings yet
Soviet Cybernetik
Document4 pages
Soviet Cybernetik
Paulo Henrique Flores
No ratings yet
IntroductiontoResearch (1) - 5
Document1 page
IntroductiontoResearch (1) - 5
SMC Homeloan
No ratings yet
Air Travel Effortless
Document10 pages
Air Travel Effortless
saad
No ratings yet
Theoretical Perspectives in Psychology PDF
Document268 pages
Theoretical Perspectives in Psychology PDF
Sivagami Loga
No ratings yet
Contrastive Analysis
Document15 pages
Contrastive Analysis
نوفل السعدي
No ratings yet
Sentence Stress
Document27 pages
Sentence Stress
LILIANA LABRA CARRILLO
No ratings yet
Learn What Narrative Therapy Is and How This Technique Can Help You
Document2 pages
Learn What Narrative Therapy Is and How This Technique Can Help You
Pavi Arul
No ratings yet
Techne. Research in Philosophy and Technology
Document99 pages
Techne. Research in Philosophy and Technology
Mariano Mosquera
No ratings yet
Meaning and Significance of Learning Resources in Development of School Curriculum.
Document20 pages
Meaning and Significance of Learning Resources in Development of School Curriculum.
Furqan Sayed
No ratings yet
Ceu San Pablo Admission
Document2 pages
Ceu San Pablo Admission
Inés Oliveros Oguro
No ratings yet
Buku Esp
Document123 pages
Buku Esp
Lena Marliana Harahap
No ratings yet
Tugas: 20% Presensi: 10% Keaktifan: 20% UTS: 25% UAS: 25%
Document45 pages
Tugas: 20% Presensi: 10% Keaktifan: 20% UTS: 25% UAS: 25%
putri rizky
No ratings yet
Organizational Context
Document16 pages
Organizational Context
Kavinda Kosgoda
No ratings yet
A Longitudinal Analysis of An Accelerating Effect of Empowerment On Job Satisfaction: Customer-Contact vs. Non-Customer-Contact Workers
Document20 pages
A Longitudinal Analysis of An Accelerating Effect of Empowerment On Job Satisfaction: Customer-Contact vs. Non-Customer-Contact Workers
Jayesh Desai
No ratings yet
Principles of Speech Writing - Data Gathering
Document4 pages
Principles of Speech Writing - Data Gathering
Ellengrid
100% (3)
Life Sciences Question Papers 2021 Shift 1
Document90 pages
Life Sciences Question Papers 2021 Shift 1
tauhidj
No ratings yet
Means of Expressing Futurity
Document10 pages
Means of Expressing Futurity
Pereteanu Elena
No ratings yet
Grade 9 English Exam
Document2 pages
Grade 9 English Exam
Ricardo Acedo Jr.
100% (1)
Conflict Management
Document11 pages
Conflict Management
Abubakar Gafar
No ratings yet
The Neuroscience of Psychotherapy
Document12 pages
The Neuroscience of Psychotherapy
James Black
100% (1)
Chesterton's Thomist Ontology
Document12 pages
Chesterton's Thomist Ontology
Ryan Hayes
No ratings yet
Cbse Class 9 10 Syllabus 2017 18 Language German PDF
Document14 pages
Cbse Class 9 10 Syllabus 2017 18 Language German PDF
Nitika Goswami
No ratings yet
Orientatio N Topic General Competency Specific Competencies Expected Performance Process
Document5 pages
Orientatio N Topic General Competency Specific Competencies Expected Performance Process
Freddie Banaga
No ratings yet
Theory of Mind
Document14 pages
Theory of Mind
Target
No ratings yet
Ncert Solutions Feb2021 First Flight Class 10 English Chapter 11 The Proposal
Document6 pages
Ncert Solutions Feb2021 First Flight Class 10 English Chapter 11 The Proposal
NAWIN AMIRTHARAJ
No ratings yet
Introduction and Questions by Chapters
Document21 pages
Introduction and Questions by Chapters
María Vega Quirós
No ratings yet