Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

SCHOOL OF MATHEMATICAL AND COMPUTER SCIENCES

Department of Computer Science

F29AI

ARTIFICIAL INTELLIGENCE AND INTELLIGENT AGENTS

Semester 1— 2020/21

Duration: 24 Hour Take-Home Exam

ANSWER ALL FOUR QUESTIONS

Candidates may use a University approved calculator


Q1–Q2 RP, Q3–Q4 AE F29AI

1. Uninformed, Informed, and Game Tree Search

(a) Consider the following grid representing the states and transitions in
a search problem. States are labelled with letters. An agent can
move between states provided the two states are adjacent and not
blocked by a wall (the black squares). Diagonal movement is not per-
mitted. Each move into an adjacent white costs 1 resource. A move
into a grey square (H, P, R) costs 3 resources. E.g., a move from D
to H costs 3 resources but a move from H to J costs 1 resource. The
grid also includes a tunnel that allows the agent to travel from one
side of the grid to the other. An agent in state E can use the tunnel
to travel to state P at a cost of 3 resources.

If S is the start state and G is the goal state, determine the order
in which states are expanded, as well as the goal path returned, for
each of the following search methods. For best-first (greedy) search,
include the h-value of each expanded state. For A* search, include
the f-value of each expanded state. To estimate the distance from
any state to the goal, the Manhattan distance heuristic should be
used. Assume that when states are added to the fringe that ties are
resolved so that states appearing earlier in alphabetical order are
expanded first, and that no state is expanded more than once.

i) Depth-first search

States expanded:

Goal path: (2)

ii) Breadth-first search

States expanded:

Goal path: (2)


2 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI

iii) Best-first (greedy) search

States expanded (with h values):

Goal path: (3)

iv) A* search

States expanded (with f values):

Goal path: (3)

v) Is the Manhattan heuristic admissible for the given grid? Explain why
or why not with specific references to the grid, costs, and heuristic.
(2)

vi) Say the grid was modified so that R was no longer a grey square but
was an ordinary white square. Is the Manhattan heuristic admissible
for the resulting grid? Explain why or why not with specific references
to the grid, costs, and heuristic. (2)

3 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI

(b) Consider the following game tree. Upward facing triangles (A, B, D,
F, I) represent the game moves for the maximising player. Downward
facing triangles (C, G, I) represent the game moves for the minimising
player. Squares represent terminal states.

i) Assuming both players play optimally, what is the minimax value for
each of the states A, B, C, D, E, F, G, H, I in the game tree? According
to minimax, which move should the maximising player make? (3)

ii) What states are pruned by alpha-beta pruning when applied to the
above tree? (3)

iii) Assume that each of the minimising states (C, G, H) is now a chance
state, and that each outcome from a chance state is equally likely.
What is the expectimax value for each of the states A, B, C, D, E, F,
G, H, I in the tree? According to expectimax, which move should the
maximising player make? (3)

iv) Consider a game where a minimising player can make one of two
possible moves, M1 or M2. After each move, a 6-sided die is rolled.
If move M1 was made, the resulting value of the game will be the
value shown on the die roll. If move M2 was made, the resulting
value will be twice the value shown on the die roll. Describe the
game tree that represents this game. (2)

4 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI

2. Knowledge Representation and Automated Planning

Consider the following description of a planning domain. A house con-


tains four rooms: a kitchen, a bedroom, an office, and a hall. A mobile
robot can pass between adjacent rooms provided it is holding a power
cell. Moving to an adjacent room uses up the power cell and the robot
cannot move again until it has another power cell. A power cell is con-
tained in each room. A special locked and alarmed door (D) prevents
the robot from going outside. Picking up a security pass turns off the
alarm. Pressing a special button unlocks the door, provided the alarm
is off. The robot can hold multiple objects at one time. Assume that the
domain contains five locations (kitchen, bedroom, office, hall, outside),
four power cells (c1, c2, c3, c4), a security pass (p), and a button (b). A
floorplan showing the room locations, locked door, and object locations
is given below:

The robot starts in the kitchen and has three actions available to it:

• move-robot(?x ?y ?z): move from location ?x to location ?y using


power cell ?z
• pickup-object(?x ?y): pickup object ?x at location ?y
• press-button(?b ?x ?y ?z): press button ?b in location ?x to unlock
the door between ?y and ?z

5 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI

Answer the following questions using PDDL notation where necessary.

(a) Write PDDL actions for move-robot, pickup-object, and press-button.


Use descriptive names for the domain properties but do not change
the action parameters or use typing. For each action, also include a
short English description of the action’s preconditions and effects. (11)

(b) Say the robot in the above scenario would like to go outside while
holding the security pass. Define the initial state and goal for a plan-
ning problem that models this problem using PDDL notation. (5)

(c) State a plan that achieves the goal of going outside with the security
pass from the initial state described in the problem description. Use
PDDL notation for each action in the plan and its instantiated param-
eters. Also provide a short explanation of the plan in plain English.
(3)

(d) Say that the robot house scenario describes a real-world scenario.
How would the agent and environment types of a real-world scenario
compare to the assumptions of modelling the problem as a classical
planning problem? Make reference to specific agent and environ-
ment types in your answer. (4)

(e) Say that oneMove(R,X,Y) is the head of a Prolog rule that en-
codes that robot R can travel from X to Y by making one move, and
hasPower(R,P) is the head of a Prolog rule that encodes that robot
R is carrying a power cell P that lasts for one move. Write a Prolog
rule called twoMove(R,X,Y) that encodes the idea that robot R can
make two consecutive moves between two different locations X and
Y, with sufficient power. (2)

6 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI

3. Markov Decision Processes (MDPs) and Reinforcement Learning (RL)

(a) Consider the following Markov Decision Process (MDP) of a robot


running with an ice-cream:
• The actions are either to run or walk.
• The three states are: having one scoop of ice-cream (1S), having
two scoops (2S), or having none (0S).
• Walking will always give the robot a reward of +1. Running with
one scoop will give a reward of +2, and it might, with a probability
of 0.5, be rewarded with another scoop of ice cream.
• However running with 2 scoops is very risky: it will either make the
robot drop both scoops with a probability of 0.8, and a reward of
-10; or else the robot will not drop any scoop, receiving a reward
of +2.
• Assume no discount of future actions (γ = 1.0).

Using Policy Evaluation evaluate the policy π for 2 iterations where:


π(1S) = Run; π(2S) = Run. Present the results in tabular format as
shown below. The first row (Vπ0 ) has been initialised to 0.

1S 2S
Vπ0 0 0
Vπ1
Vπ2
(6)
7 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI

(b) You are given the Gridworld shown below. Assume that the Markov
Decision Process is unknown. However, you can observe the fol-
lowing four episodes generated. Each training episode specifies a
sequence of transitions with each transition comprised of the cur-
rent state s, the chosen action a, the new state s0 and the received
immediate reward, in that order. Assume that there is no discount
(γ = 1.0).

Use model-based Reinforcement Learning to estimate the transition


function for the following state-action pairs:

T (A, south, C) =
T (B, east, C) =
T (C, south, E) =
T (C, south, D) =
(4)

(c) Using Direct Evaluation, evaluate the policy π specified by the arrows
in the same grid figure in part (b) above. Provide values for the
following quantities:
V̂ π (A) =
V̂ π (B) =
V̂ π (C) =
V̂ π (D) =
V̂ π (E) =
(5)
8 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI

(d) Briefly explain in plain English what the Direct Evaluation method
tries to estimate, and how. Also, in what sense is Direct Evaluation
a model-free method? Use the MDP and Episodes in (b) above to
illustrate your answers. (2)

(e) Using Q-Learning (Model-free), and assuming that all Q-Values for
every <state, action> pair have been initialised to 0, and a learn-
ing rate of 0.6 (α = 0.6), fill in the Q-Table below after your agent
has experienced the first two episodes above. The red cells show
that the <state, action> pair is not available. You can ignore these.
A B C D E
Up
Down
Right
Left
Exit
(8)

9 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI

4. Language Models & Syntactic Parsing of Natural Language

(a) You are given the following corpus, C, of sentences:


hsi It is not red h/si hsi What is the shape ? h/si
hsi It is green h/si hsi What is the colour ? h/si
hsi The colour is not red h/si hsi What colour is it not ? h/si
hsi The shape is square ? h/si hsi It is what ? h/si
hsi Red is not the colour h/si hsi Is it not red ? h/si
hsi Square is the shape h/si hsi What is it ? h/si
hsi The colour is green h/si hsi What shape is it ? h/si

Assuming a Bigram Language Model trained on C using Maximum


Likelihood Estimation (MLE), and without any smoothing, answer the
following questions. Note that the question mark ‘?’ is also a token of
the language. Don’t forget to show how you arrived at your answers.

• What is the probability of generating the following sentences:


i. The colour is square ?
ii. What is not red
iii. Given the sequence of words, “It is . . . ”, what is the most likely
next word, and what is its probability?

(7)

(b) You are given the following Context-Free Grammar, G:

Grammar Lexicon
S → NP VP | VP Det → a | the
S → Aux NP VP N → run | mile
NP → Det Nom | NPN V → run
Nom → Nom N | Adj Nom | Nom PP | N NPN → London
VP→ V NP | V Aux → does
VP → V PP | VP PP | V Adv Adv → fast
PP → Prep NP Adj → fast
Prep → in

Convert it to Chomsky Normal Form (CNF). Note, that you do not


have to deal with the rules under the Lexicon column. (7)

(c) Using the original grammar, G, in part (c), parse the sentence ‘Run
a mile in London’, clearly showing how you applied the rules of the
grammar by showing the sequence of resulting rewrites. The order
in which you apply the rewrite rules does not matter for this question
as long as you have applied them correctly. (4)

10 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI

(d) Now assume that we add the following rule to the original grammar
G:

V P → V NP P P

Explain how this extra rule gives rise to structural ambiguity in pars-
ing the same sentence, "Run a mile in London". Using the new ex-
panded grammar, produce two parse trees for the same sentence,
one corresponding to your derivation in (d), and another (you don’t
need to give the derivation for this).
(7)

END OF PAPER
11 of 11

You might also like