Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Artificial Intelligence and

Intelligent Agents (F29AI)

MDP I: Intro to Markov Decision


Processes
Arash Eshghi

Based on slides from Ioannis Konstas @HWU, Verena Rieser @HWU, Dan Klein @UC Berkeley
So far…
• How to formalise a real-world problem?
• How to reach a goal?
• Blind search: DFS, BFS, UCS
• Informed search: Greedy, A*
• How to maximise an outcome if there is an adversary?
• Adversary search: minimax, alpha-beta pruning
• What to do if this adversary (environment) is
probabilistic?
• Expectimax, maximum expected utilities
Today
• Non-deterministic worlds
• Markov Decision Processes (MDPs)
• Time-limited values

How can you plan optimally if your


actions are non-deterministic?
(you only have a distribution over the
outcome)
Today
Example: Grid World
• A maze-like problem
• The agent lives in a grid
• Walls block the agent’s path
• Noisy movement: actions do not always go
as planned
• 80% of the time, the action North takes the
agent North (if there is no wall there)
• 10% of the time, North takes the agent West;
10% East
• If there is a wall in the direction the agent would
have been taken, the agent stays put
• The agent receives rewards each time step
• Small “living” reward each step (can be negative)
• Big rewards come at the end (good or bad)
• Goal: maximize sum of rewards
Grid World Actions
Deterministic Stochastic
Grid World Grid World
Markov Decision Processes

• An MDP is defined by:


• A set of states s Î S
• A set of actions a Î A
• A transition function T(s, a, s’)
• Prob that action a from s leads to s’
i.e., P(s’ | s,a)
• Also called “the model”
• A reward function R(s, a, s’)
• Sometimes just R(s) or R(s’)
• A start state (or distribution)
• Maybe a terminal state

• MDPs are a family of non-deterministic search problems


• One way to solve them is with expectimax search – but we will
have a new tool soon
7
Video of Demo Gridworld
Manual Intro
Gridworld Demo
What is Markov about MDPs?

• “Markov” generally means that given the present state,


the future and the past are independent.

• For Markov decision processes, “Markov” means action


outcomes depend only on the current state

Andrey Markov
(1856-1922)

You might also like