Welcome to Scribd!

5.4-Reinforcement learning-part3-Q-Learning

Uploaded by

polinati.vinesh2023

0% found this document useful (0 votes)

4 views18 pages

This document provides an overview of Q-learning, a reinforcement learning algorithm. Q-learning aims to learn an optimal policy that maximizes total reward by learning the quality (Q) of taking an action (a) in a given state (s). The algorithm initializes a Q(s,a) function and then iteratively updates it based on rewards and choices made in each state during episodes. After many episodes, the optimal policy emerges as the one that takes the action with the highest Q-value in each state. The document demonstrates the algorithm through an example gridworld environment.

Original Description:

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

4 views18 pages

5.4-Reinforcement learning-part3-Q-Learning

Uploaded by

polinati.vinesh2023

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 18

Search inside document

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Week 5 Machine Learning enabled

by prior Theories

Video 5.4 Reinforcement Learning – Part 3 Q-learning

Q Learning

Q-learning is a model-free off-policy TD reinforcement learning algorithm.

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

For any finite Markov decision process (FMDP), Q-learning finds a policy that is optimal in the sense that it
maximizes the expected value of the total reward over any and all successive steps, starting from the current
state.

Q-learning can identify an optimal action-selection policy for any given FMDP, given infinite exploration time
and a partly-random policy.

"Q" names the function Q(s,a) that can be said to stand for the "quality" of an action a taken in a given state s.

Suppose we have the optimal Q-function (s, a) then the optimal policy in state s is argmax a Q(s, a).
Q-learning Algorithm

Initialize Q(s, a) arbitrarily

Repeat (for each episode)
Initialize s
Repeat (for each step of the episode)
Take action a, observe r, s’
Q(s, a) 🡨 Q(s, a) + α[r + γ max Q(s’, a’) – Q(s, a)]
a’
s 🡨 s’

With α =1 or α =1 and γ = 1 the updating formula is simplified

Q(s, a) 🡨 r + γ max Q(s’, a’)
Q(s, a) 🡨 r + max Q(s’, a’)
Example

r=8

r=0
r=-8
States and Actions

States: s Actions: a
1 2 3 4 5
N

6 7 8 9 10
S

11 12 13 14 15
E

16 17 18 19 20
W

Assume that α=1 and γ = 0.5

Initializing the Q(s, a) function

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A N 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c
t S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i
o W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n
s E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
An Episode

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20
Calculating new Q(s, a) values

1st step:

2nd step:

3rd step:

4th step:
The Q(s, a) function after the first episode

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A N 0 0 0 0 0 0 -8 0 0 0 0 0 0 0 0 0 0 0 0 0
c
t S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i
o
W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n
s
E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A second episode

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20
Calculating new Q(s, a) values

1st step:

2nd step:

3rd step:

4th step:
The Q(s, a) function after the second episode

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A N 0 0 0 0 0 0 -8 0 0 0 0 0 0 0 0 0 0 0 0 0
c
t S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i
o W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n
s E 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0
The Q(s, a) function after a few episodes

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

N 0 0 0 0 0 0 -8 -8 -8 0 0 1 2 4 0 0 0 0 0 0
A
c
t S 0 0 0 0 0 0 0.5 1 2 0 0 -8 -8 -8 0 0 0 0 0 0
i
o W 0 0 0 0 0 0 -8 1 2 0 0 -8 0.5 1 0 0 0 0 0 0
n
s E 0 0 0 0 0 0 2 4 8 0 0 1 2 -8 0 0 0 0 0 0
One of the optimal policies

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

N 0 0 0 0 0 0 -8 -8 -8 0 0 1 2 4 0 0 0 0 0 0
A
c S 0 0 0 0 0 0 0.5 1 2 0 0 -8 -8 -8 0 0 0 0 0 0
t
i W 0 0 0 0 0 0 -8 1 2 0 0 -8 0.5 1 0 0 0 0 0 0
o
n
E 0 0 0 0 0 0 2 4 8 0 0 1 2 -8 0 0 0 0 0 0
s
An optimal policy graphically

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20
Another of the optimal policies

States
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A N 0 0 0 0 0 0 -8 -8 -8 0 0 1 2 4 0 0 0 0 0 0
c
t S 0 0 0 0 0 0 0.5 1 2 0 0 -8 -8 -8 0 0 0 0 0 0
i
o W 0 0 0 0 0 0 -8 1 2 0 0 -8 0.5 1 0 0 0 0 0 0
n
s E 0 0 0 0 0 0 2 4 8 0 0 1 2 -8 0 0 0 0 0 0
Another optimal policy graphically

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20
NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Thanks for your attention!

The next lecture 5.5 will be on the topic:

Case Based Reasoning

Cercha Que Es
Document88 pages
Cercha Que Es
Zhaira Rodriguez Garcia
No ratings yet
Fish To Xy To Fmea
Document25 pages
Fish To Xy To Fmea
Sudhagar
No ratings yet
Parameters A B e D G M N K H Policy Variables G T M Price Level P T C I G X T
Document16 pages
Parameters A B e D G M N K H Policy Variables G T M Price Level P T C I G X T
anon-836322
No ratings yet
Parameters A B e D G M N K H Policy Variables G T M Price Level P T C I G X T
Document16 pages
Parameters A B e D G M N K H Policy Variables G T M Price Level P T C I G X T
anon-836322
No ratings yet
Lote Pi LIC LC Tamaño de Muestra # de Defectuosos
Document16 pages
Lote Pi LIC LC Tamaño de Muestra # de Defectuosos
juan david agudelo perodmo
No ratings yet
Latian Etabs
Document217 pages
Latian Etabs
anon_764923136
No ratings yet
Pengawasan P3K Di Tempat Kerja
Document9 pages
Pengawasan P3K Di Tempat Kerja
ryan chaidalana
No ratings yet
Datos de Entrada: Unidad Coordenadas
Document49 pages
Datos de Entrada: Unidad Coordenadas
Grommash Hellscream
No ratings yet
MUETREO
Document44 pages
MUETREO
noer
No ratings yet
Kelompok Perlakuan (Met+Scs)
Document7 pages
Kelompok Perlakuan (Met+Scs)
fi.afifah Nur
No ratings yet
Metodo Matricial (Cercha)
Document64 pages
Metodo Matricial (Cercha)
Pablo Perez
No ratings yet
Tsbackmatter
Document43 pages
Tsbackmatter
manishayx
No ratings yet
The Upper Tail Probability Q
Document1 page
The Upper Tail Probability Q
cse
No ratings yet
The Upper Tail Probability Q
Document1 page
The Upper Tail Probability Q
cse
No ratings yet
TIP Ga Shop Nov 23
Document28 pages
TIP Ga Shop Nov 23
Riza Animation
No ratings yet
2) 16-Pf-Prueba (1) (3) - 1
Document42 pages
2) 16-Pf-Prueba (1) (3) - 1
Navarro Viera Shary Jazmín
No ratings yet
Laporan Praktikum Eksperimen Fisika 1-Gelombang Mikro
Document7 pages
Laporan Praktikum Eksperimen Fisika 1-Gelombang Mikro
HELMI ZAHRA
No ratings yet
Calculos Lab3 San1
Document3 pages
Calculos Lab3 San1
Edu-Ingeniero
No ratings yet
Calculos Lab3 San1
Document3 pages
Calculos Lab3 San1
Edu-Ingeniero
No ratings yet
MANPOWER
Document18 pages
MANPOWER
John Rhey Almojallas Benedicto
No ratings yet
Dasar 01
Document2 pages
Dasar 01
imahar
No ratings yet
Grade 4 2nd Quarter EPP
Document54 pages
Grade 4 2nd Quarter EPP
Jade D. Pepito
No ratings yet
Project Name:: Seyrantepe - Pile - D120/Pl20
Document47 pages
Project Name:: Seyrantepe - Pile - D120/Pl20
roshan
No ratings yet
Weapon Length Fumble Attacks Criticals Base Range # Special Modifications Size Pri. Sec. Ball & Chain 3' 5% M 1HC CR
Document15 pages
Weapon Length Fumble Attacks Criticals Base Range # Special Modifications Size Pri. Sec. Ball & Chain 3' 5% M 1HC CR
Michael Herrmann
No ratings yet
Master Jadwal
Document6 pages
Master Jadwal
Pipit Janupitasari
No ratings yet
IEEE 14 Bus Power System
Document3 pages
IEEE 14 Bus Power System
misbahrajpoot875
No ratings yet
Interseccion: Jr. Huancane - Av. Circunvalacion Fecha: 8/23/2011 Dia: Jueves Aproximacion N-S
Document72 pages
Interseccion: Jr. Huancane - Av. Circunvalacion Fecha: 8/23/2011 Dia: Jueves Aproximacion N-S
Wilmer Pachauri
No ratings yet
Number Systems in Different Base or Radix Individual Digit Value in Base-10
Document2 pages
Number Systems in Different Base or Radix Individual Digit Value in Base-10
Beran Kılıç
No ratings yet
Phase 6 - Solve Problems by Applying The Algorithms of Unit 3
Document20 pages
Phase 6 - Solve Problems by Applying The Algorithms of Unit 3
Edgar Quintero
No ratings yet
Tabel Data Hasil Percobaan BAB IV No. I1 (Amp) V1 (Volt) P1 (Watt) I2 (Amp) Cos Z (Ohm) R (Ohm) X (Ohm) A 1. 2. 3. 4. 5. 6. 7. 8. 9. 10
Document5 pages
Tabel Data Hasil Percobaan BAB IV No. I1 (Amp) V1 (Volt) P1 (Watt) I2 (Amp) Cos Z (Ohm) R (Ohm) X (Ohm) A 1. 2. 3. 4. 5. 6. 7. 8. 9. 10
alfat
No ratings yet
5D - Richmon Pribadi
Document124 pages
5D - Richmon Pribadi
Richmon Pribadi
No ratings yet
Input Data Sheet For E-Class Record: Region Division District School Name School Id
Document15 pages
Input Data Sheet For E-Class Record: Region Division District School Name School Id
Kaeriee Macalia Yumul
No ratings yet
Resumen Raciones Diciembre
Document2 pages
Resumen Raciones Diciembre
nicole
No ratings yet
Quality Report II
Document4 pages
Quality Report II
api-469023149
No ratings yet
Control Charts
Document11 pages
Control Charts
Hasler Machaca Paredes
No ratings yet
Armonico 1
Document6 pages
Armonico 1
Oscar Martínez
No ratings yet
Mohr Stress
Document10 pages
Mohr Stress
Muhamad Nur Azis
No ratings yet
CHE1023: Production and Operations Management Da-Ii
Document3 pages
CHE1023: Production and Operations Management Da-Ii
vedant verma
No ratings yet
Tabla Dodge Roming 2 Hojas
Document2 pages
Tabla Dodge Roming 2 Hojas
Vicente Vazquez
No ratings yet
Bài5 Mgo t83
Document10 pages
Bài5 Mgo t83
thanhthao9104
No ratings yet
Grade 9 1 ST Quarter Abraham
Document54 pages
Grade 9 1 ST Quarter Abraham
Boredpanda
No ratings yet
EXCEL-FIQUI LogP
Document73 pages
EXCEL-FIQUI LogP
HENRY RODRIGO CHUCO BAILON
No ratings yet
Assignment No 2 - Module 2 Evaluation
Document4 pages
Assignment No 2 - Module 2 Evaluation
Arman del Mundo
No ratings yet
Bước lặp
Document1 page
Bước lặp
blue eyes
No ratings yet
Gráfi Co de Canti Dad: N Nro Li NC Xmin Xmax R A A' R' Xmin' Xmax'
Document3 pages
Gráfi Co de Canti Dad: N Nro Li NC Xmin Xmax R A A' R' Xmin' Xmax'
Nayeli Huamani Perez
No ratings yet
Table of Normal Distribution
Document1 page
Table of Normal Distribution
Tanjamul
No ratings yet
DTMC Ss
Document18 pages
DTMC Ss
Yarooq Anwar
No ratings yet
ESCALERA1
Document157 pages
ESCALERA1
David Londoño
No ratings yet
Data Binary Logit
Document20 pages
Data Binary Logit
soham
No ratings yet
JANEIRO - Horas Dedicadas: Total P/Dia
Document15 pages
JANEIRO - Horas Dedicadas: Total P/Dia
Ramon Santos
No ratings yet
INTERNAL TEST-1 Mark Details ACADEMIC YEAR:2018-19 Sub. Name: Ee8251 Circuit Theory
Document24 pages
INTERNAL TEST-1 Mark Details ACADEMIC YEAR:2018-19 Sub. Name: Ee8251 Circuit Theory
Raja pandiyan
No ratings yet
Program#3
Document8 pages
Program#3
Bashir Alsadawi
No ratings yet
Card Counter
Document5 pages
Card Counter
chibara69
No ratings yet
14 Finta Eprida 25822019 Tugas 2 McCormack
Document187 pages
14 Finta Eprida 25822019 Tugas 2 McCormack
Ardhi
No ratings yet
Correccion 16 PF C
Document11 pages
Correccion 16 PF C
Camilo Matias Salinas Rodriguez
No ratings yet
Coils Gantt Chart
Document12 pages
Coils Gantt Chart
Sampath Kumar
No ratings yet
1
Document7 pages
1
AkashNachrani
No ratings yet
Curah Hujan 15 THN Paket 1
Document147 pages
Curah Hujan 15 THN Paket 1
Yudha Agung Rangga Malela
No ratings yet
Sample Format For Early Enrolment
Document1 page
Sample Format For Early Enrolment
Skul TV Show
No ratings yet
Rylie & Kylie Adventures: The Missing Tickets
From Everand
Rylie & Kylie Adventures: The Missing Tickets
Monique Scarver
No ratings yet
Assignments For Week 5 2024
Document10 pages
Assignments For Week 5 2024
polinati.vinesh2023
No ratings yet
4.1-Inductive Learning Based On Symbolic Representations and Week Theories
Document9 pages
4.1-Inductive Learning Based On Symbolic Representations and Week Theories
polinati.vinesh2023
No ratings yet
Assignments For Week 6 2024
Document13 pages
Assignments For Week 6 2024
polinati.vinesh2023
No ratings yet
Assignments For Week 7 2024
Document21 pages
Assignments For Week 7 2024
polinati.vinesh2023
No ratings yet
7.2 Interdisciplinary Inspiration
Document17 pages
7.2 Interdisciplinary Inspiration
polinati.vinesh2023
No ratings yet
4.2-GeneralizationAsSearch Part 1
Document17 pages
4.2-GeneralizationAsSearch Part 1
polinati.vinesh2023
No ratings yet
5 2-ExplanationBasedLearning
Document19 pages
5 2-ExplanationBasedLearning
polinati.vinesh2023
No ratings yet
6 9-DeepLearning
Document8 pages
6 9-DeepLearning
polinati.vinesh2023
No ratings yet
5.4-Reinforcement Learning-Part2-Learning-Algorithms
Document15 pages
5.4-Reinforcement Learning-Part2-Learning-Algorithms
polinati.vinesh2023
No ratings yet
Convolutional Neural Networks-Part2
Document21 pages
Convolutional Neural Networks-Part2
polinati.vinesh2023
No ratings yet
Hebbian Learning and Associative Memory
Document13 pages
Hebbian Learning and Associative Memory
polinati.vinesh2023
No ratings yet
Hopfield Networks and Boltzman Machines-Part 1
Document13 pages
Hopfield Networks and Boltzman Machines-Part 1
polinati.vinesh2023
No ratings yet
Perceptrons
Document11 pages
Perceptrons
polinati.vinesh2023
No ratings yet
Hopfield Networks and Boltzman Machines-Part 2
Document13 pages
Hopfield Networks and Boltzman Machines-Part 2
polinati.vinesh2023
No ratings yet
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
Document18 pages
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
polinati.vinesh2023
No ratings yet
Recurrent Neural Networks
Document18 pages
Recurrent Neural Networks
polinati.vinesh2023
No ratings yet
Model of Neuron in An ANN
Document12 pages
Model of Neuron in An ANN
polinati.vinesh2023
No ratings yet
IET - Applications of Machine Learning in Wireless Communications
Document492 pages
IET - Applications of Machine Learning in Wireless Communications
Karina TQ
No ratings yet
CS6700 Pa3
Document3 pages
CS6700 Pa3
Rahul me20b145
No ratings yet
Stateless Reinforcement Learning
Document5 pages
Stateless Reinforcement Learning
Karina Wahyu Noviyanti
No ratings yet
Q-Learning and Deep Q Networks (DQN)
Document52 pages
Q-Learning and Deep Q Networks (DQN)
bscjjw
No ratings yet
Robust Forex Trading With Deep Q Network (DQN)
Document19 pages
Robust Forex Trading With Deep Q Network (DQN)
hamed mokhtari
No ratings yet
AI (6th) May2022
Document2 pages
AI (6th) May2022
Armaan Singh
No ratings yet
Applications of The Self-Organising Map To Reinforcement Learning
Document18 pages
Applications of The Self-Organising Map To Reinforcement Learning
Dewi Rahmasari
No ratings yet
Decision and Choice Luce's Choice Axiom PDF
Document18 pages
Decision and Choice Luce's Choice Axiom PDF
Ahmed Gouda
100% (1)
Deep Reinforcement Learning-Based Text Anonymization Against Private-Attribute Inference
Document10 pages
Deep Reinforcement Learning-Based Text Anonymization Against Private-Attribute Inference
Duarte Harris Cruz
No ratings yet
Application of Reinforcement Learning To The Game of Othello
Document20 pages
Application of Reinforcement Learning To The Game of Othello
Yumna Z
No ratings yet
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
Document24 pages
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
Sathya J
No ratings yet
Design of Deep Reinforcement Learning Controller Through Data Assisted Model For Robotic Fish Speed Tracking
Document14 pages
Design of Deep Reinforcement Learning Controller Through Data Assisted Model For Robotic Fish Speed Tracking
hadi
No ratings yet
Slides-ASR Presentation Formulation
Document10 pages
Slides-ASR Presentation Formulation
Bisma Nusa
No ratings yet
hw4 PDF
Document6 pages
hw4 PDF
鄭博仁
No ratings yet
Trajectory Optimization For Autonomous Flying Base Station Via Reinforcement Learning
Document5 pages
Trajectory Optimization For Autonomous Flying Base Station Via Reinforcement Learning
Abhishek Gupta
No ratings yet
Getting Started With Reinforcement Learning and Open AI Gym
Document10 pages
Getting Started With Reinforcement Learning and Open AI Gym
KSD
No ratings yet
Market Making Via Reinforcement Learning
Document10 pages
Market Making Via Reinforcement Learning
Jose Antonio Dos Ramos
No ratings yet
Deep Reinforcement Learning - Guide To Deep Q-Learning
Document1 page
Deep Reinforcement Learning - Guide To Deep Q-Learning
kwaku Godo
No ratings yet
Machine Learning For Beginners PDF
Document29 pages
Machine Learning For Beginners PDF
franky
No ratings yet
1.supervised and Unsupervised
Document42 pages
1.supervised and Unsupervised
rajthakre81
No ratings yet
Neural Networks Based Reinforcement Learning For Mobile Robots Obstacle Avoidance
Document12 pages
Neural Networks Based Reinforcement Learning For Mobile Robots Obstacle Avoidance
Alexander Peregrina Ochoa
No ratings yet
Tensorlayer Documentation: Release 1.11.1
Document258 pages
Tensorlayer Documentation: Release 1.11.1
Ognjen Ozegovic
No ratings yet
Smart Grid Optimiization by Deep Reinforcement Learning Over Discreet and Continuous Action Space
Document4 pages
Smart Grid Optimiization by Deep Reinforcement Learning Over Discreet and Continuous Action Space
Simbarashe Nguruve
No ratings yet
Machine Learnign in CR
Document24 pages
Machine Learnign in CR
Ahmed Abdelaziz
No ratings yet
(IJCST-V10I3P29) :riswana E A, Roushath Beevi K S, Salmath K A, Sandra Santhosh, Jisha Jamal
Document4 pages
(IJCST-V10I3P29) :riswana E A, Roushath Beevi K S, Salmath K A, Sandra Santhosh, Jisha Jamal
EighthSenseGroup
No ratings yet
Assignment 4
Document6 pages
Assignment 4
SHUBHAM PANCHAL
No ratings yet
Artificial Intelligence A Z Learn How To Build An AI 2
Document33 pages
Artificial Intelligence A Z Learn How To Build An AI 2
Suresh Naidu
100% (1)
Multi-Agent Based Experiments On Uniform Price and Pay-as-Bid Electricity Auction Markets
Document5 pages
Multi-Agent Based Experiments On Uniform Price and Pay-as-Bid Electricity Auction Markets
api-3697505
No ratings yet
Cederborg Etal 2015 PDF
Document7 pages
Cederborg Etal 2015 PDF
banned miner
No ratings yet
Homework #3: MDPS, Q-Learning, &: Pomdps
Document18 pages
Homework #3: MDPS, Q-Learning, &: Pomdps
shivam pradhan
No ratings yet