Professional Documents
Culture Documents
Question 1) Search 10 Marks: Final Term Examination Spring-2020
Question 1) Search 10 Marks: Final Term Examination Spring-2020
The agent wants to find a path between the starting cell and the goal cell in as low of a cost as
possible. However, the grid has special cells marked as ‘W’ and ‘X’, if your agent moves to the
‘W’ cell, then the agent will be teleported a cell (randomly selected) next to the goal state, but
the cost will be ‘+7’. If your agent moves to the ‘X’ cell then your agent will be teleported to the
starting cell, however, the cost will be ‘-7’.
W
L G
W
A
Here the agent is at the position (0,0), and the goal is at the position (3,7). If the agent moves to
the ‘W’ cell at (4,2), then the agent will be transported to either (3,6) or (2,7) or (4,7).
The objective is to fill the grid with numbers from 1 to 9 in a way that the following conditions
are met:
Each row, column, and nonet contains each number exactly once.
The sum of all numbers in a cage must match the small number printed in its corner.
No number appears more than once in a cage. (This is the standard rule for killer
sudokus, and implies that no cage can include more than 9 cells.)
In 'Killer X', an additional rule is that each of the long diagonals contains each number once. 1
1
https://en.wikipedia.org/wiki/Killer_sudoku
More about this type of Sudoku can be found over here
https://en.wikipedia.org/wiki/Killer_sudoku. Do not worry you are not going to be asked to solve
the puzzle here. However, you are required to answer the following questions:
S The Cliff G
For the problem described above, answer the following question:
I. Formulate the problem as a MDP
II. Use policy iteration to find the optimal policy
III. Suppose that you are not given the transition or the reward function, suppose that you observe
the following (state, action, reward, state’) tuples, in episode 1
Episode 1:
( (0,0), Up, ‐1, (1,0) )
( (0,1), Down, ‐1, (0,0) )
( (0,0), Right, ‐1, (0,0) )
( (0,0), Left, ‐1, (0,0) )
( (0,0), Up, ‐1, (1,0) )
( (0,1), Right, ‐1, (1,1) )
( (1,1), Right, ‐1, (1,2) )
( (1,2), Right, ‐1, (1,3) )
( (1,3), Right, ‐1, (1,4) )
( (1,4), Down, ‐1, (0,4) )
Calculate the TD estimates of all the states in Episode 1
IV. Use the MDP code given to you in your LAB and implement this scenario.
I. Model the LUDO game as an adversarial search problem
II. Would you use minimax or expectimax search to solve LUDO? Explain your reasoning
III. What will be the complexity of adversarial search for LUDO?
IV. Draw the search graph of expectimax search for a turn for each player, assuming there are four
players. One is controlled by you.
V. Will it be feasible to do a vanilla search for solving LUDO? Why or why not?
VI. Will you design an optimistic agent or a pessimistic agent for LUDO? explain your reasoning
VII. How would you speed up the decision‐making process of your algorithm? Explain the
improvements you would make and why they will be helpful