Welcome to Scribd!

CSD311: Artificial Intelligence

Uploaded by

0% found this document useful (0 votes)

12 views5 pages

Q-learning is an algorithm that learns optimal actions in finite state spaces by iteratively updating a Q-table based on Bellman's equation without requiring a model of state transitions. Monte Carlo sampling can be used when the state-action space is too large for a Q-table by sampling sequences of actions using a policy like epsilon-greedy and updating state-action values based on returns. When the state-action space is still too large, a neural network is trained on sampled state-action pairs and returns to approximate the value function.

Original Description:

Original Title

l20RL

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

12 views5 pages

CSD311: Artificial Intelligence

Uploaded by

Ayaan Khan

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 5

Search inside document

CSD311: Artificial Intelligence

Finite MDP, Q-learning algorithm

I Q-learning is an algorithm to learn good/optimal actions in

states with finite S.
I Q : S × A → R is a table that is updated in an iterative
manner. The algorithm is model free - that is does not have
to learn a state change model. But it will not work for infinite
spaces.
I The update is done using Bellman’s equation (see
qlearning.ipynb for an example):
Monte Carlo sampling I

I In many RL problems Q-learning cannot be used directly due

to the size of the table.
I One way to address this is to sample sequences of actions
using MC sampling as per a policy p that uses the current
values of the expected reward Ep [Rt |st , at ] for (st , at ) pairs
similar to Q-learning.
I A policy that is often used is the -greedy policy that was
discussed for MAB but generalized to handle states.
I A possible implementation for an adversarial game is given
below. The same approach can be used when there is a single
agent (e.g. a video game).
I The initial (s, a) values can be initialized in some way - one
possibility is randomly in some interval.
Monte Carlo sampling II

I The current (s, a) value is updated after a rollout of r moves

by γ r −1 if there is a win, −γ r −1 if there is a loss, and left
unchanged if there is a draw (no loss or win). The Ep [R|s, a]
value is the unormalized (s, a) value divided by the number of
times it has been updated. The method is similar to how
values were updated in MCTS.
I Assuming -greedy is being used the algorithm chooses a
random move with probability and the move with highest
expected value with probability (1 − ).
I Often the value is changed using a decay function and a
threshold. The decay function is an exponential in the number
of rollouts. So, as rollouts increase and the Ep (R|s, a] values
stabilize the value reaches a low threshold level so that the
program is exploiting most of the time.
I In play mode the value is set to 0 so that the agent plays the
move with the highest reward for each pair (s, a).
Monte Carlo sampling III

I When the number of (s, a) pairs is very large the above is not
feasible and the only option is to use a function approximator
that learns to predict Ep [st , at ] for (s, a) pairs.
I In a particular state s the outcome for each action ai after
rollout using the -greedy policy is used to estimate the value
Rp [s, ai ] for each action and the (s, ai ) pairs are fed to a
function approximator, typically a neural network (often a
convolutional neural n/w). The neural n/w is trained with the
(s, a) and Rp [s, a] data.
I In most adversarial games two artificial agents play each other
a large number of times and the generated data is
continuously used to train the neural n/w as play progresses.

RPubs - Panel Data Examples Using R&Quot
Document1 page
RPubs - Panel Data Examples Using R&Quot
jcgutz3
No ratings yet
Parallel Random Number Generation: Ahmet Duran CISC 879
Document37 pages
Parallel Random Number Generation: Ahmet Duran CISC 879
smkjadoon
No ratings yet
CSD311: Artificial Intelligence
Document11 pages
CSD311: Artificial Intelligence
Ayaan Khan
No ratings yet
3 Types of Gradient Descent Algorithms For Small & Large Datasets
Document9 pages
3 Types of Gradient Descent Algorithms For Small & Large Datasets
contactcenter2 Lider
No ratings yet
DeepMind Whitepaper
Document9 pages
DeepMind Whitepaper
crazylifefreak
No ratings yet
Report 2: Bingqian Yi (r0726769), Fabian Fingerhut (r0736509), Maria Camila Alvarez T. (r0731521) 13th May 2019
Document9 pages
Report 2: Bingqian Yi (r0726769), Fabian Fingerhut (r0736509), Maria Camila Alvarez T. (r0731521) 13th May 2019
Armando Uribe
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
Document1 page
Introduction To Machine Learning Algorithms: Linear Regression
Rahul Thorat
No ratings yet
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
Document7 pages
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
Rajat Rai
No ratings yet
ALS Large-Scale Parallel Collaborative Filtering For The Netflix Prize
Document12 pages
ALS Large-Scale Parallel Collaborative Filtering For The Netflix Prize
milos_vuckovic_4
No ratings yet
RL Complete Unit-5
Document30 pages
RL Complete Unit-5
Harpreet Singh Bagga
No ratings yet
Types of Machine Learning: Supervised Learning: The Computer Is Presented With Example Inputs and Their
Document50 pages
Types of Machine Learning: Supervised Learning: The Computer Is Presented With Example Inputs and Their
Jayesh Gadewar
No ratings yet
Rls Algorithm
Document15 pages
Rls Algorithm
niranjan_mehar
No ratings yet
Financial Derivatives
Document6 pages
Financial Derivatives
Viraat Lakhanpal
No ratings yet
5.4-Reinforcement Learning-Part2-Learning-Algorithms
Document15 pages
5.4-Reinforcement Learning-Part2-Learning-Algorithms
polinati.vinesh2023
No ratings yet
Intro To Forecasting
Document15 pages
Intro To Forecasting
trickledowntheroad
No ratings yet
Final Course Project - Reinforcement Learning Adaptive Traffic Control System Using N-Step SARSA
Document12 pages
Final Course Project - Reinforcement Learning Adaptive Traffic Control System Using N-Step SARSA
Anirudh Rangarajan
No ratings yet
NIPS 2004 Experts in A Markov Decision Process Paper Compressed
Document8 pages
NIPS 2004 Experts in A Markov Decision Process Paper Compressed
Tasya Syifa Altanzania
No ratings yet
16895-Article Text-20389-1-2-20210518
Document8 pages
16895-Article Text-20389-1-2-20210518
Francis Simpson
No ratings yet
Image Processing On The GPU: A Canonical Example: Scales Ns Orientatio Colors
Document11 pages
Image Processing On The GPU: A Canonical Example: Scales Ns Orientatio Colors
serkantokay
No ratings yet
Assignment 3 - ReinforcementLearning - 200508263 - AdityaAnantharaman - Trikkur
Document9 pages
Assignment 3 - ReinforcementLearning - 200508263 - AdityaAnantharaman - Trikkur
adyanrfuture
No ratings yet
Computational Complexity
Document5 pages
Computational Complexity
Shreyash Badole
No ratings yet
Report
Document11 pages
Report
testdeviceno2
No ratings yet
Monte Carlo Learning
Document14 pages
Monte Carlo Learning
Sivasathiya G
No ratings yet
Richi's Neural Nets Summary
Document114 pages
Richi's Neural Nets Summary
anton dolganov
No ratings yet
Monte-Carlo Planning in Large Pomdps
Document9 pages
Monte-Carlo Planning in Large Pomdps
Nathan Pearson
No ratings yet
Computing For Data Sciences: Introduction To Regression Analysis
Document9 pages
Computing For Data Sciences: Introduction To Regression Analysis
Anurag
No ratings yet
Introduction To Algorithm Analysis and Design
Document18 pages
Introduction To Algorithm Analysis and Design
Arya Chinnu
No ratings yet
Pre-Lecture 3 - Big O
Document3 pages
Pre-Lecture 3 - Big O
monkey
No ratings yet
MC Openmp
Document10 pages
MC Openmp
Bui Khoa Nguyen Dang
No ratings yet
325 Notes
Document23 pages
325 Notes
Shipra Agrawal
No ratings yet
Design & Analysis of Algorithms (DAA) Unit - I
Document18 pages
Design & Analysis of Algorithms (DAA) Unit - I
Yogi Nambula
No ratings yet
Adaptive Affinity Propagation Clustering
Document6 pages
Adaptive Affinity Propagation Clustering
Prathamesh Pawar
No ratings yet
Long Short-Term Memory (LSTM) : 1. Problem Statement
Document6 pages
Long Short-Term Memory (LSTM) : 1. Problem Statement
Emmanuel Lwele Lj Junior
No ratings yet
Introduction To Algorithms (1997) : Steven Skiena
Document25 pages
Introduction To Algorithms (1997) : Steven Skiena
spiritedfaraway
No ratings yet
Reinforcement Learning Based Quadcopter Controller
Document7 pages
Reinforcement Learning Based Quadcopter Controller
Joma Jemmy
No ratings yet
Assignment3 Yash Patel
Document10 pages
Assignment3 Yash Patel
adyanrfuture
No ratings yet
Algorithm Types and Classification
Document5 pages
Algorithm Types and Classification
Sheraz Ali
No ratings yet
Data Wrangling and Preprocessing
Document41 pages
Data Wrangling and Preprocessing
Archana Balikram
No ratings yet
Aiml Docx
Document11 pages
Aiml Docx
Wbn tournaments
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
Document17 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
Anuranan Das
No ratings yet
HW1
Document5 pages
HW1
titu2541
No ratings yet
Chap 2 Training Feed Forward Neural Networks
Document22 pages
Chap 2 Training Feed Forward Neural Networks
HRITWIK GHOSH
No ratings yet
Genetic 1
Document21 pages
Genetic 1
Priyanka Saini
No ratings yet
Architectures and Algorithms For DSP Systems (Crl702) : Centre For Applied Research in Electronics Iit Delhi
Document8 pages
Architectures and Algorithms For DSP Systems (Crl702) : Centre For Applied Research in Electronics Iit Delhi
Raj Aryan
No ratings yet
Newtons Method of Trading
Document8 pages
Newtons Method of Trading
Shravan Vn
No ratings yet
13 Random Variables and Simulation
Document8 pages
13 Random Variables and Simulation
jordyswann
No ratings yet
Zeyu Tan AI Assessment Report
Document14 pages
Zeyu Tan AI Assessment Report
Anna Lion
No ratings yet
Feature Description & Extraction: FAST (Features From Accelerated Segment Test)
Document11 pages
Feature Description & Extraction: FAST (Features From Accelerated Segment Test)
Azad Jagtap
No ratings yet
Dynamic Programming Applications
Document9 pages
Dynamic Programming Applications
Jaya Bisht
No ratings yet
IEEE Xplore
Document4 pages
IEEE Xplore
Abhishek Singh
No ratings yet
In Randomized Quick Sort A Random Element Is Choose As A Pivot Element
Document17 pages
In Randomized Quick Sort A Random Element Is Choose As A Pivot Element
Kiran Chaudhary
No ratings yet
Freund Schapire 1999 Adaptive Game Playing Using Multiplicative Weights - Ps
Document19 pages
Freund Schapire 1999 Adaptive Game Playing Using Multiplicative Weights - Ps
vsorana
No ratings yet
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
Document42 pages
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
Adhityo Priyambodo
No ratings yet
Activations, Loss Functions & Optimizers in ML
Document29 pages
Activations, Loss Functions & Optimizers in ML
Aniket Dhar
No ratings yet
Chapter - 08 (Analysis)
Document42 pages
Chapter - 08 (Analysis)
Jenber
No ratings yet
Mastering The Game of Go Without Human Knowledge
Document18 pages
Mastering The Game of Go Without Human Knowledge
Taras Zakharchenko
100% (1)
RL Course Report
Document10 pages
RL Course Report
shane
No ratings yet
Amortized Analysis-Hiring Problem
Document34 pages
Amortized Analysis-Hiring Problem
Yash khatri
No ratings yet
Basic Exercises for Competitive Programming: Python
From Everand
Basic Exercises for Competitive Programming: Python
Jan Pol
No ratings yet
Summary of Jimmy Song's Programming Bitcoin
From Everand
Summary of Jimmy Song's Programming Bitcoin
IRB Media
No ratings yet
CSD311: Artificial Intelligence
Document33 pages
CSD311: Artificial Intelligence
Ayaan Khan
No ratings yet
CSD311: Artificial Intelligence
Document31 pages
CSD311: Artificial Intelligence
Ayaan Khan
No ratings yet
L 22 NNpractical
Document15 pages
L 22 NNpractical
Ayaan Khan
No ratings yet
CSD311: Artificial Intelligence
Document12 pages
CSD311: Artificial Intelligence
Ayaan Khan
No ratings yet
CSD311: Artificial Intelligence
Document11 pages
CSD311: Artificial Intelligence
Ayaan Khan
No ratings yet
S.A. Tecala Integrated School Iri Form 5-A (School Reading Remediation Program - SRRP Parts of The SRRP
Document6 pages
S.A. Tecala Integrated School Iri Form 5-A (School Reading Remediation Program - SRRP Parts of The SRRP
Crisalie ancheta
No ratings yet
PCK 6 Assessment of Learning 2 Unit 1
Document19 pages
PCK 6 Assessment of Learning 2 Unit 1
Maricel Rivera
No ratings yet
Suicide Detection With Natural Language Processing
Document14 pages
Suicide Detection With Natural Language Processing
Min Yi Tsai
No ratings yet
History of The Filipino People
Document40 pages
History of The Filipino People
karlaloise.torrefranca
No ratings yet
Cambridge IGCSE™: Art and Design 0400/02 October/November 2020
Document4 pages
Cambridge IGCSE™: Art and Design 0400/02 October/November 2020
Adetoke Aderamo
No ratings yet
Global Citizen
Document2 pages
Global Citizen
api-657985468
No ratings yet
Awareness and Preparations of Teachers in ODL
Document10 pages
Awareness and Preparations of Teachers in ODL
Mayan Saldayan
No ratings yet
Jerome Bruner 1
Document19 pages
Jerome Bruner 1
Marinella Capiral
No ratings yet
Problem - Project - and Design-Based Learning Their
Document5 pages
Problem - Project - and Design-Based Learning Their
ajifatkhur.pipa
No ratings yet
CHAPTER 5cohesion AND DEVELOPMENT
Document12 pages
CHAPTER 5cohesion AND DEVELOPMENT
Kim Zenneia Ulboc
No ratings yet
Consumer Behavior 11th Edition Schiffman Test Bank
Document41 pages
Consumer Behavior 11th Edition Schiffman Test Bank
ciaramilcahbrpe
100% (29)
Lesson 1 Summary Guide For Ucsp
Document2 pages
Lesson 1 Summary Guide For Ucsp
MIHKE PATRICIA RIOS
No ratings yet
Reviewer Philosophy
Document7 pages
Reviewer Philosophy
Jhon Laurence Dumendeng
No ratings yet
Narrative Therapy For Ctad
Document41 pages
Narrative Therapy For Ctad
nikos kasiktsis
No ratings yet
Roster Design
Document2 pages
Roster Design
Minilik Tikur Sew
No ratings yet
2.1 The Structural View of Language
Document2 pages
2.1 The Structural View of Language
Usama Abaid
No ratings yet
Statistical Studies
Document25 pages
Statistical Studies
Mathew Ricky
No ratings yet
Importance of Ethics in Research
Document2 pages
Importance of Ethics in Research
ROSE ANN BALI-OS
No ratings yet
The NLP Goal-Setting Model
Document5 pages
The NLP Goal-Setting Model
WahjoeGunawan
No ratings yet
Research Sample - Chapter-1-3
Document39 pages
Research Sample - Chapter-1-3
Madeline Mangaya Arcella
100% (1)
Neo-Five-Factor Inventory (NEO-FFI)
Document22 pages
Neo-Five-Factor Inventory (NEO-FFI)
rajpurkarpraj31
No ratings yet
HR Analytics UNIT 1
Document7 pages
HR Analytics UNIT 1
Tommy Yadav
No ratings yet
CHAPTER 4 Components of Special and Inclusive Education
Document41 pages
CHAPTER 4 Components of Special and Inclusive Education
Arthurmie Jr Caitor
No ratings yet
Profed 2022 New
Document606 pages
Profed 2022 New
jay castillo
No ratings yet
Instructional-Learning-Plan-Mapeh 7
Document5 pages
Instructional-Learning-Plan-Mapeh 7
ALMA GEYROZAGA
No ratings yet
Braddon-Mitchell and Jackson Challenges To Functionalism
Document22 pages
Braddon-Mitchell and Jackson Challenges To Functionalism
Ziyue Tao
No ratings yet
Taxonomies of Reading Comprehension - (Literacy Strategy Guide)
Document8 pages
Taxonomies of Reading Comprehension - (Literacy Strategy Guide)
Princejoy Manzano
No ratings yet
Colina Math Weeks 11 12
Document16 pages
Colina Math Weeks 11 12
Lucius Moonstar
No ratings yet
(Ronald E. Riggio) Introduction To Industrial Orga (Z-Lib - Org) Compressed-1-225
Document225 pages
(Ronald E. Riggio) Introduction To Industrial Orga (Z-Lib - Org) Compressed-1-225
Widy Boy
No ratings yet
June 2
Document3 pages
June 2
Dawn Nah
No ratings yet