Welcome to Scribd!

Markov Decision Processes (MDP) : Sudeshna Sarkar

Uploaded by

0% found this document useful (0 votes)

47 views14 pages

The document discusses Markov Decision Processes (MDPs), which provide a framework for sequential decision making in uncertain environments. An MDP is modeled as a 4-tuple of states, actions, transition probabilities, and rewards. Transition probabilities define the likelihood of moving between states based on actions. The goal is to find an optimal policy that maximizes expected rewards over time. MDPs can be used for problems involving uncertainty like robotics, resource allocation, and more. A policy maps each state to an action, and the value is the expected utility or reward when following that policy over time.

Original Description:

All about Markoniv Decision Process

Original Title

MDP-7-Sep-17

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

47 views14 pages

Markov Decision Processes (MDP) : Sudeshna Sarkar

Uploaded by

Anonymous LI2DAcv

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 14

Search inside document

Markov Decision Processes (MDP)

Sudeshna Sarkar
Department of Computer Science & Engineering
IIT Kharagpur
6-7 Sep 2017
How would you get to the airport in the
least amount of time?
 Metro
 Uber
 Taxi
 Airport Express

2
Uncertainty in the real world
 Randomness shows up in many places.
 Could be caused by limitations of the sensors and actuators of the
robot
 Could be caused by market forces or nature, which we have no
control over.

 State s, action a
 State s1’
 State s2’
 …

 How can we hope to act optimally in the face of randomness?

 Certainly we can't just have a single deterministic plan, and
talking about a minimum cost path doesn't make sense.

3
Applications
 Robotics: decide where to move, but actuators can
fail, hit unseen obstacles, etc.

 Resource allocation: decide what to produce, don't

know the customer demand for various products

 Agriculture: decide what to plant, but don't know

weather and thus crop yield

4
Volcano crossing

5
Dice Game
For each round r = 1, 2, …
 You choose stay or quit.
 If quit, you get $10 and we end the game.
 If stay, you get $4 and then I roll a 6-sided dice.
 If the dice results in 1 or 2, we end the game.
 Otherwise, you continue to the next round.

6
MDP for Dice Game
For each round r = 1, 2, …
 You choose stay or quit.
 If quit, you get $10 and we end the game.
 If stay, you get $4 and then I roll a 6-sided dice.
 If the dice results in 1 or 2, we end the game.
 Otherwise, you continue to the next round.

7
MDP
Markov Decision Processes

Decision Theoretic Planning

Markov Property: The transition properties depend

only on the current state, not on previous history
(how that state was reached)

8
MDP Model
MDP Model <S, A, T, R>
Agent State set S
Action set A
State Reward Action Markov transition function
T(s,a,s’)=Pr(s’|s,a)
Environment
Bounded real-valued reward
function R(s)
• Can be generalized to include
a0 a1 a2 action costs: R(s,a)
s0 s1 s2 s3
r0 r1 r2 • Can be generalized to be a
stochastic function
Process:
• Observe state st in S
• Choose action at in At
• Receive immediate reward rt
• State changes to st+1
9
Similarities of MDP with Search?

10
Transitions
 The transition probabilities T(s, a, s’) specify the
probability of ending up in state s’ if taken action a in
state s.
s a s’ T(s,a,s’)
in quit end 1
in stay in 2/3
in stay end 1/3
 For each state s and action a:

෍ 𝑇 𝑠, 𝑎, 𝑠 ′ = 1
𝑠′∈𝑆
11 Successors: 𝑠′ such that 𝑇 𝑠, 𝑎, 𝑠 ′ > 0
Exercise: Transportation problem
 Street with blocks numbered 1 to n.
 Walking from s to s + 1 takes 1 minute.
 Taking a magic tram from s to 2s takes 2 minutes.
 How to travel from 1 to n in the least time?
 Tram fails with probability 0.5.

12
What is a solution?
 Search problem: path (sequence of actions)
 MDP: ??
 MDP: Policy
 A Policy 𝜋 is a mapping from each state s 2 States to
an action 𝑎 ∈ Actions(𝑠)

13
Evaluating a policy
 Following a policy yields a random path.
 The utility of a policy is the (discounted) sum of the
rewards on the path (this is a random quantity).
Path Utility
[in; stay, 4, end] 4
[in; stay, 4, in; stay, 4, in; stay, 4, end] 12
[in; stay, 4, in; stay, 4, end] 8
[in; stay, 4, in; stay, 4, in; stay, 4, in; stay, 4, end] 16
...
The value of a policy is the expected utility.

Problem 1: Markov Reward Process
Document3 pages
Problem 1: Markov Reward Process
Abdul Taufeeq
No ratings yet
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
Document4 pages
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
Abdul Taufeeq
No ratings yet
Markov Decision
Document11 pages
Markov Decision
radha gulati
No ratings yet
Sp14 Cs188 Lecture 8 - Mdps I
Document50 pages
Sp14 Cs188 Lecture 8 - Mdps I
xixip66099
No ratings yet
Wa 1
Document9 pages
Wa 1
Nohan Joemon
No ratings yet
Markov Decision Processes For Path Planning in Unpredictable Environment
Document8 pages
Markov Decision Processes For Path Planning in Unpredictable Environment
VartolomeiDumitru
No ratings yet
NIPS 2004 Experts in A Markov Decision Process Paper Compressed
Document8 pages
NIPS 2004 Experts in A Markov Decision Process Paper Compressed
Tasya Syifa Altanzania
No ratings yet
Policies, Search, Utility
Document13 pages
Policies, Search, Utility
Aimee Lemma
No ratings yet
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
Document40 pages
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
Amulya
No ratings yet
A17 Complexdecisions
Document28 pages
A17 Complexdecisions
Abhishek Nanda
No ratings yet
Reinforcement Learning: - Task
Document15 pages
Reinforcement Learning: - Task
shivansh
No ratings yet
Lec 08
Document59 pages
Lec 08
daliYop
No ratings yet
An Introduction To Markov Decision Processes: Bob Givan Ron Parr Purdue University Duke University
Document23 pages
An Introduction To Markov Decision Processes: Bob Givan Ron Parr Purdue University Duke University
Rosin Price
No ratings yet
RL Complete Unit-5
Document30 pages
RL Complete Unit-5
Harpreet Singh Bagga
No ratings yet
CSD311: Artificial Intelligence
Document11 pages
CSD311: Artificial Intelligence
Ayaan Khan
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
Document40 pages
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
Sanja Lazarova-Molnar
No ratings yet
RL Theory Tutorial
Document80 pages
RL Theory Tutorial
pupilo74
No ratings yet
CS6700 RL 2024 Wa1
Document7 pages
CS6700 RL 2024 Wa1
Rahul me20b145
No ratings yet
Inteligensia Buatan
Document23 pages
Inteligensia Buatan
Desca T Paulus
No ratings yet
CHAPTER 20-Final
Document20 pages
CHAPTER 20-Final
HEMANTH KUMAR.KOLA
No ratings yet
MDP PDF
Document37 pages
MDP PDF
bsudheertec
No ratings yet
Lecture 8 Map Building 2
Document34 pages
Lecture 8 Map Building 2
jack2423
No ratings yet
Ta Lecture2
Document26 pages
Ta Lecture2
alan
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
Document15 pages
5.4-Reinforcement Learning-Part1-Introduction
polinati.vinesh2023
No ratings yet
Monte Carlo Search in Graph Structures Applied To MDPS: Luisa A. de Almeida and Carlos H. C. Ribeiro
Document6 pages
Monte Carlo Search in Graph Structures Applied To MDPS: Luisa A. de Almeida and Carlos H. C. Ribeiro
Carlos Ribeiro
No ratings yet
Unit 3 - Machine Learning - WWW - Rgpvnotes.in PDF
Document18 pages
Unit 3 - Machine Learning - WWW - Rgpvnotes.in PDF
Youth Maker
No ratings yet
Pomdps
Document76 pages
Pomdps
Himanshu Srivastava
No ratings yet
Topic10 DTMC LimitingDistribution
Document3 pages
Topic10 DTMC LimitingDistribution
Jane Hanger
No ratings yet
Join Ordering in Fragment Queries: Approach I: Ordering Joins Without Using Semi-Joins
Document7 pages
Join Ordering in Fragment Queries: Approach I: Ordering Joins Without Using Semi-Joins
Harris Tahir
No ratings yet
AI File - 03 PDF
Document43 pages
AI File - 03 PDF
Nahim's kitchen
No ratings yet
Chapter 2
Document50 pages
Chapter 2
hrakbarinia
No ratings yet
AI A Z HandBook
Document12 pages
AI A Z HandBook
Esteban Ignacio Melendez Osechas
No ratings yet
02 MarkovDecisionProcess
Document51 pages
02 MarkovDecisionProcess
Kerry Beach
No ratings yet
A Brief Introduction To Reinforcement Learning
Document4 pages
A Brief Introduction To Reinforcement Learning
Narendra Patel
No ratings yet
MDP Presentation CS594
Document31 pages
MDP Presentation CS594
Mohan Sharma
No ratings yet
08 MDPs
Document110 pages
08 MDPs
Cao Quốc Nguyễn
No ratings yet
Files 1 2019 June NotesHubDocument 1559416363
Document6 pages
Files 1 2019 June NotesHubDocument 1559416363
mail.sushilk8403
No ratings yet
Artificial Intelligence: Sylvain Gelly, David Silver
Document20 pages
Artificial Intelligence: Sylvain Gelly, David Silver
MichaelaŠtolová
No ratings yet
An Overview of Machine Learning
Document42 pages
An Overview of Machine Learning
Maitreyee Pal
No ratings yet
15 MDP
Document35 pages
15 MDP
gudu58939
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
Document66 pages
Lecture 3 - MDPs and Dynamic Programming
Trinaya Kodavati
No ratings yet
Monte-Carlo Planning in Large Pomdps
Document9 pages
Monte-Carlo Planning in Large Pomdps
Nathan Pearson
No ratings yet
Markov Decision Process
Document21 pages
Markov Decision Process
Balqis Yafis
No ratings yet
Reinforcement Learning: Instructor: Max Welling
Document18 pages
Reinforcement Learning: Instructor: Max Welling
Zuzar
No ratings yet
Anytime D Star
Document10 pages
Anytime D Star
nasi_balap
No ratings yet
Behavior Design of A Human-Interactive Robot Through Parallel Tasks Optimization
Document10 pages
Behavior Design of A Human-Interactive Robot Through Parallel Tasks Optimization
HasanOkyarBayraktar
No ratings yet
Markov Decision Processes: An MDP Has Is A 4-Tuple
Document1 page
Markov Decision Processes: An MDP Has Is A 4-Tuple
VishakhaPareek
No ratings yet
19MAT301 - Practice Sheet 2 & 3
Document10 pages
19MAT301 - Practice Sheet 2 & 3
Vishal C
No ratings yet
A Tutorial For Reinforcement Learning
Document17 pages
A Tutorial For Reinforcement Learning
NirajDhotre
No ratings yet
Quick Start: Resolving A Markov Decision Process Problem Using The Mdptoolbox in Matlab
Document9 pages
Quick Start: Resolving A Markov Decision Process Problem Using The Mdptoolbox in Matlab
Omid khosravi
No ratings yet
Lecture RL
Document37 pages
Lecture RL
prakuld04
No ratings yet
17 MDP
Document28 pages
17 MDP
nada abdelrahman
No ratings yet
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
Document19 pages
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
monsieresm
No ratings yet
CS229 Andrew NG
Document15 pages
CS229 Andrew NG
Edward
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
Document15 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
movatehire
No ratings yet
Stability Analysis For Control Systems
Document36 pages
Stability Analysis For Control Systems
SARTHAK BAPAT
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
Document7 pages
Playing Geometry Dash With Convolutional Neural Networks
friedman
No ratings yet
Elasticity: Theory and Applications
From Everand
Elasticity: Theory and Applications
Adel S. Saada
No ratings yet
Analog Dialogue, Volume 48, Number 1
From Everand
Analog Dialogue, Volume 48, Number 1
Analog Dialogue
Rating: 4 out of 5 stars
4/5 (1)
Gaussian Basis Sets for Molecular Calculations
From Everand
Gaussian Basis Sets for Molecular Calculations
S. Huzinaga
No ratings yet
ICT Practical 10th
Document6 pages
ICT Practical 10th
Anonymous ifheT2Yhi
100% (1)
Cyberplus
Document108 pages
Cyberplus
Alberto Sanchez
100% (1)
Error Codes and Common Solutions
Document6 pages
Error Codes and Common Solutions
Ng Lay Hoon
No ratings yet
Program With PL/SQL
Document9 pages
Program With PL/SQL
Rakesh Kumar
No ratings yet
Audio Processing in Matlab PDF
Document5 pages
Audio Processing in Matlab PDF
Aditish Dede Etniz
No ratings yet
Getting Started With RabbitMQ and CloudAMQP
Document137 pages
Getting Started With RabbitMQ and CloudAMQP
Suresh Bayappagari
No ratings yet
Mapping Templates in PI 7
Document4 pages
Mapping Templates in PI 7
Parvez2z
No ratings yet
Arduino Simulation Projects Using Arduino Simulation Library Models
Document20 pages
Arduino Simulation Projects Using Arduino Simulation Library Models
vali29
100% (1)
MEP Drawing List
Document3 pages
MEP Drawing List
Mohsin Khan
No ratings yet
Fortran Course
Document75 pages
Fortran Course
alireza me
No ratings yet
CS312 SML / NJ Cheat Sheet
Document1 page
CS312 SML / NJ Cheat Sheet
PrasunGhosh
No ratings yet
Tolerance Analysis Extension
Document2 pages
Tolerance Analysis Extension
Rajasekaran Vt
No ratings yet
Non-Orthogonal Multiple Access NOMA As A PDF
Document5 pages
Non-Orthogonal Multiple Access NOMA As A PDF
Hanif Khan
No ratings yet
Reliability MINITAB
Document5 pages
Reliability MINITAB
bhavin shah
No ratings yet
FIRST™ Gold For Business 2.0: Items Descriptions
Document7 pages
FIRST™ Gold For Business 2.0: Items Descriptions
sara stormborn123
No ratings yet
Critical Path Method - Yatri
Document14 pages
Critical Path Method - Yatri
Bhavesh Yadav
No ratings yet
Hci Thesis Topics
Document5 pages
Hci Thesis Topics
denisehalvorsenaurora
100% (2)
Strategic Training: Dr. Sania Zahra Malik
Document20 pages
Strategic Training: Dr. Sania Zahra Malik
Sana Azhar
No ratings yet
Nutanix Spec Sheet
Document1 page
Nutanix Spec Sheet
Shukur Sharif
No ratings yet
Ten Best Practices For Better Revit Performance
Document3 pages
Ten Best Practices For Better Revit Performance
Nilay Desai
100% (1)
ANSYS 14 Structural Mechanics Composites
Document36 pages
ANSYS 14 Structural Mechanics Composites
Ankur Chelseafc
100% (1)
Dakonyemba Kolani Resume
Document2 pages
Dakonyemba Kolani Resume
api-376182478
No ratings yet
RM v15 Manual
Document733 pages
RM v15 Manual
sallyythesaint
No ratings yet
DataStructure Practicals
Document86 pages
DataStructure Practicals
Turtle17
No ratings yet
How To Fix Wrong DID Entries After A Disk Replacement
Document7 pages
How To Fix Wrong DID Entries After A Disk Replacement
Mq Sfs
No ratings yet
Ch5 Mitra DSP 2p
Document35 pages
Ch5 Mitra DSP 2p
Rohan Gaonkar
No ratings yet
Preguntas de Direccionamiento IP
Document3 pages
Preguntas de Direccionamiento IP
api-3701162
100% (1)
User Manual XNP 6370rh English Web 0209
Document77 pages
User Manual XNP 6370rh English Web 0209
juan
No ratings yet
GNU Radio - benchmarkTX Code Walk Through
Document42 pages
GNU Radio - benchmarkTX Code Walk Through
Hilmi Mujahid
No ratings yet
IDM Domain - Design Logical Architecture OIM
Document2 pages
IDM Domain - Design Logical Architecture OIM
venureddy
No ratings yet