IIM7064 Dynamic Programming

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

IIM7064 Dynamic Programming

Ya-Tang Chuang

Department of Industrial and Information Management


National Cheng Kung University

1 / 18 Ya-Tang Chuang Dynamic Programming


TODAY ’ S OUTLINE

About this class

Introduction of dynamic programming

Some examples

2 / 18 Ya-Tang Chuang Dynamic Programming


C LASS OVERVIEW

Office hours: By appointment; chuangyatang@gs.ncku.edu.tw


TA: Allen Lin, r36091026@gs.ncku.edu.tw
No required textbook, but you can refer the following books
Sheldon Ross, Introduction to Stochastic Dynamic Programming,
Academic Press, 1983
Martin L. Puterman, Markov Decision Processes - Discrete
Stochastic Dynamic Programming, Wiley, 1994. (electronic book)
Important Dates
10/06: Team formation
11/03: Select one research paper for your representation

3 / 18 Ya-Tang Chuang Dynamic Programming


C OURSE TOPICS

Modeling: Modeling problems as a dynamic program


Backward induction
Finite horizon
Infinite horizon

Solving algorithms
Value iteration
Policy iteration

Structural results of the optimal policies


Modular functions
Monotone policies

Time permitting: Other selected topics

4 / 18 Ya-Tang Chuang Dynamic Programming


C OURSE SCHEDULE

Week Progress Description


1 Introduction, Course Orientation, and Q&A
2-4 Deterministic Problems
5 Stochastic Problems
6-7 Finite-Horizon Problems
8 Backward Induction
9-10 Structural Results of Optimal Policy
11 Infinite-Horizon Problems
12 Solving Algorithms
13 Midterm Exam
14 Partially Observable Markov Decision Processes
15-16 Paper Presentation
17-18 Project Presentation

5 / 18 Ya-Tang Chuang Dynamic Programming


PAPER PRESENTATION

Team of 2-3 members

Select one paper from the list provided

Each team will be given 20 minutes

Each member should contribute equally during presentation

Each team should ask at least two questions for other’s


presentation

Your presentation should include


Problem set up
Main results
Main contributions
Criticism

Scores will be based on peer evaluation

6 / 18 Ya-Tang Chuang Dynamic Programming


T ERM PROJECT

Modeling one sequential decision making problem as a Markov


decision process

Solving this problem through dynamic programming

Providing numerical or theoretical results

Giving a presentation

Scores will be based on peer evaluation

You also need to write a report; your report should include


Problem set up
Main results
Potential improvements

7 / 18 Ya-Tang Chuang Dynamic Programming


C OURSE OBJECTIVES

Understand how dynamic programming (DP) is used to model a


variety of situations, and what properties are desirable in the DP

Be able to analyze DP equations that may be helpful to “solve” a


given problem

Know the basic concepts of complexity theory, in particular,


understand the situations under which the optimal policy may be
numerically intractable

Learn basic techniques for deriving properties of value function


and optimal policies

8 / 18 Ya-Tang Chuang Dynamic Programming


G OAL OF THE COURSE

Primary course objective: After taking this course, you should be


able to formulate a given problem using dynamic programming
and derive solutions

We will not spend time teaching coding, but you should have the
ability to complete your project

9 / 18 Ya-Tang Chuang Dynamic Programming


G RADING

I don’t view grades in graduate courses as very important. You


should be here because you want to learn. Nevertheless, need
to assign grades:
Assignments 20%
Midterm Exam 30%
Paper Presentation 10%
Term Project Presentation 10%
Term Project Report 30%

Assignments: You can discuss with other people, but assignment


solutions should be your own work

In any assignment, you must properly give credit to any outside


resources you use (such as books, papers, web sites, other
people, etc.)

10 / 18 Ya-Tang Chuang Dynamic Programming


Q UESTIONS ABOUT THE COURSE ?

11 / 18 Ya-Tang Chuang Dynamic Programming


W HAT IS DYNAMIC PROGRAMMING ?

Dynamic Programming is a collection of mathematical tools used


to analyze sequential decision processes.
Such processes entail a sequence of actions taken toward some
goal, often in the face of uncertainty.

During each discrete time period, the process goes through a


sequence of steps that involve (1) state observation, (2) selection
of an action or control, (3) incurrence of a cost, and (4) state and
time update.

If the sequential decision processes terminate after a finite


number of stages, they are called finite horizon processes.
Otherwise, they are called infinite horizon processes.

12 / 18 Ya-Tang Chuang Dynamic Programming


T HE BIG PICTURE OF MODEL

At a given state, the decision maker can take an action/control


which takes the system to another state and the process is
repeated
In the process, the decision maker may receive a reward or incur
some cost, which depends on the current state and action
State transitions are usually stochastic and are a function of the
actions
The end goal is to optimize some function of the sequence of
cost/reward

13 / 18 Ya-Tang Chuang Dynamic Programming


A PPLICATIONS

PChome/Amazon online shopping

How to select add-on product to increase success probability

Figure: Rice cooker with a cooking pot (from Amazon)

14 / 18 Ya-Tang Chuang Dynamic Programming


A PPLICATIONS

PChome/Amazon online shopping

How to select add-on product to increase success probability

Figure: PS4 with a hard disk drive (from PChome)

15 / 18 Ya-Tang Chuang Dynamic Programming


A PPLICATIONS

How to determine the price of tickets to maximize total reward

16 / 18 Ya-Tang Chuang Dynamic Programming


A PPLICATIONS

Preventive maintenance

17 / 18 Ya-Tang Chuang Dynamic Programming


A PPLICATIONS

Discharge or continue treatment

18 / 18 Ya-Tang Chuang Dynamic Programming

You might also like