Professional Documents
Culture Documents
F29AI Exam 2020 2021
F29AI Exam 2020 2021
F29AI
Semester 1— 2020/21
(a) Consider the following grid representing the states and transitions in
a search problem. States are labelled with letters. An agent can
move between states provided the two states are adjacent and not
blocked by a wall (the black squares). Diagonal movement is not per-
mitted. Each move into an adjacent white costs 1 resource. A move
into a grey square (H, P, R) costs 3 resources. E.g., a move from D
to H costs 3 resources but a move from H to J costs 1 resource. The
grid also includes a tunnel that allows the agent to travel from one
side of the grid to the other. An agent in state E can use the tunnel
to travel to state P at a cost of 3 resources.
If S is the start state and G is the goal state, determine the order
in which states are expanded, as well as the goal path returned, for
each of the following search methods. For best-first (greedy) search,
include the h-value of each expanded state. For A* search, include
the f-value of each expanded state. To estimate the distance from
any state to the goal, the Manhattan distance heuristic should be
used. Assume that when states are added to the fringe that ties are
resolved so that states appearing earlier in alphabetical order are
expanded first, and that no state is expanded more than once.
i) Depth-first search
States expanded:
States expanded:
iv) A* search
v) Is the Manhattan heuristic admissible for the given grid? Explain why
or why not with specific references to the grid, costs, and heuristic.
(2)
vi) Say the grid was modified so that R was no longer a grey square but
was an ordinary white square. Is the Manhattan heuristic admissible
for the resulting grid? Explain why or why not with specific references
to the grid, costs, and heuristic. (2)
3 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI
(b) Consider the following game tree. Upward facing triangles (A, B, D,
F, I) represent the game moves for the maximising player. Downward
facing triangles (C, G, I) represent the game moves for the minimising
player. Squares represent terminal states.
i) Assuming both players play optimally, what is the minimax value for
each of the states A, B, C, D, E, F, G, H, I in the game tree? According
to minimax, which move should the maximising player make? (3)
ii) What states are pruned by alpha-beta pruning when applied to the
above tree? (3)
iii) Assume that each of the minimising states (C, G, H) is now a chance
state, and that each outcome from a chance state is equally likely.
What is the expectimax value for each of the states A, B, C, D, E, F,
G, H, I in the tree? According to expectimax, which move should the
maximising player make? (3)
iv) Consider a game where a minimising player can make one of two
possible moves, M1 or M2. After each move, a 6-sided die is rolled.
If move M1 was made, the resulting value of the game will be the
value shown on the die roll. If move M2 was made, the resulting
value will be twice the value shown on the die roll. Describe the
game tree that represents this game. (2)
4 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI
The robot starts in the kitchen and has three actions available to it:
5 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI
(b) Say the robot in the above scenario would like to go outside while
holding the security pass. Define the initial state and goal for a plan-
ning problem that models this problem using PDDL notation. (5)
(c) State a plan that achieves the goal of going outside with the security
pass from the initial state described in the problem description. Use
PDDL notation for each action in the plan and its instantiated param-
eters. Also provide a short explanation of the plan in plain English.
(3)
(d) Say that the robot house scenario describes a real-world scenario.
How would the agent and environment types of a real-world scenario
compare to the assumptions of modelling the problem as a classical
planning problem? Make reference to specific agent and environ-
ment types in your answer. (4)
(e) Say that oneMove(R,X,Y) is the head of a Prolog rule that en-
codes that robot R can travel from X to Y by making one move, and
hasPower(R,P) is the head of a Prolog rule that encodes that robot
R is carrying a power cell P that lasts for one move. Write a Prolog
rule called twoMove(R,X,Y) that encodes the idea that robot R can
make two consecutive moves between two different locations X and
Y, with sufficient power. (2)
6 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI
1S 2S
Vπ0 0 0
Vπ1
Vπ2
(6)
7 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI
(b) You are given the Gridworld shown below. Assume that the Markov
Decision Process is unknown. However, you can observe the fol-
lowing four episodes generated. Each training episode specifies a
sequence of transitions with each transition comprised of the cur-
rent state s, the chosen action a, the new state s0 and the received
immediate reward, in that order. Assume that there is no discount
(γ = 1.0).
T (A, south, C) =
T (B, east, C) =
T (C, south, E) =
T (C, south, D) =
(4)
(c) Using Direct Evaluation, evaluate the policy π specified by the arrows
in the same grid figure in part (b) above. Provide values for the
following quantities:
V̂ π (A) =
V̂ π (B) =
V̂ π (C) =
V̂ π (D) =
V̂ π (E) =
(5)
8 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI
(d) Briefly explain in plain English what the Direct Evaluation method
tries to estimate, and how. Also, in what sense is Direct Evaluation
a model-free method? Use the MDP and Episodes in (b) above to
illustrate your answers. (2)
(e) Using Q-Learning (Model-free), and assuming that all Q-Values for
every <state, action> pair have been initialised to 0, and a learn-
ing rate of 0.6 (α = 0.6), fill in the Q-Table below after your agent
has experienced the first two episodes above. The red cells show
that the <state, action> pair is not available. You can ignore these.
A B C D E
Up
Down
Right
Left
Exit
(8)
9 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI
(7)
Grammar Lexicon
S → NP VP | VP Det → a | the
S → Aux NP VP N → run | mile
NP → Det Nom | NPN V → run
Nom → Nom N | Adj Nom | Nom PP | N NPN → London
VP→ V NP | V Aux → does
VP → V PP | VP PP | V Adv Adv → fast
PP → Prep NP Adj → fast
Prep → in
(c) Using the original grammar, G, in part (c), parse the sentence ‘Run
a mile in London’, clearly showing how you applied the rules of the
grammar by showing the sequence of resulting rewrites. The order
in which you apply the rewrite rules does not matter for this question
as long as you have applied them correctly. (4)
10 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI
(d) Now assume that we add the following rule to the original grammar
G:
V P → V NP P P
Explain how this extra rule gives rise to structural ambiguity in pars-
ing the same sentence, "Run a mile in London". Using the new ex-
panded grammar, produce two parse trees for the same sentence,
one corresponding to your derivation in (d), and another (you don’t
need to give the derivation for this).
(7)
END OF PAPER
11 of 11