Assignment 4 (Updated)

Natural Language Processing
Assignment 4
Type of Questions: MCQ
Number of Questions: 8 Total Marks: 8 × 1
Question 1: What are the advantages of Maximum Entropy Markov Models (MEMM)
over Hidden Markov Models (HMM) for sequence tagging task?
1. Higher freedom of choosing features to represent observations in MEMMs than

HMMs
2. In MEMMs, domain knowledge can be incorporated through feature design
3. MEMMs makes a very simplistic assumption that the tag at current position
(yt ) in a sequence depends only on previous tag (yt−1 ) and the word at current
position (xt ).
4. None of these
Answer: 1, 2
Solution: 3 is a property of HMM and not MEMM.
Question 2: HMMs and Max Entropy Markov models are machine learning models
most suitable for .
1. Classifying images between dogs and cats.
2. Identifying POS tag for each word in a sentence.
3. Find the most relevant document given a query.
4. All of the above
1
Answer: 2
Solution: HMMs and MEMM models are suitable for sequence classification or tag-
ging.
Question 3: You have been asked to design a 17 sided dice with maximum possible
entropy for the probability distribution of getting different sides in a single throw.
But, there is another constraint that P (1) should be 0.5. What is the maximum
possible entropy that you can achieve under the given design constraints?
1. 10.0
2. 5.0
3. 3.0
4. 4.09
Answer: 3
Solution: Let’s define a random variable S for the side we get after throwing the
dice once. It is given that P (S = 1) = 0.5. That implies that,
P (S = 2) + P (S = 3) + · · · + P (S = 17) = 1 − P (S = 1)
1
=
2
Maximum entropy of the distribution is achieved when we distribute the probability
mass uniformly among s ∈ {2, 3, . . . , 17}.
1 1
P (S = 2) = P (S = 3) = · · · = P (S = 17) = =
2 × 16 32
Entropy,
1 1
H(S) = −0.5 ∗ log2 (0.5) − 16 ∗ log2 ( )
32 32
= 0.5 + 2.5
=3
2
Question 4: Assuming that POS tagging follow a first order Hidden Markov Model
with the following emission and transition matrices for its states, calculate P (x1 =
“ki00 , x2 = “f in00 , x3 = “yeni00 , y1 = “T 200 , y2 = “T 300 , y3 = “T 300 ).
[Hint: all POS tags are equally likely to be at the starting of a sequence.]
ki fin yeni
T1 0.1 0.1 0.8
T2 0.8 0.1 0.1
T3 0.2 0.2 0.6
T4 0.8 0.1 0.1
Table 1: Output Symbol probabilities (Emission probabilities)
Transition matrix is
T1 T2 T3 T4
T1 0.18 0.01 0.8 0.01
T2 0.9 0.0 0.05 0.05
T3 0.4 0.5 0.05 0.05
T4 0.4 0.5 0.05 0.05
Table 2: Hidden State transition matrix
1. 0
2. 8 × 10−5
3. 2.4 × 10−4
4. 6 × 10−5
Answer: 4
Solution:
P (x1 = “ki”,x2 = “f in”, x3 = “yeni00 , y1 = “T 2”, y2 = “T 3”, y3 = “T 300 )

=P (y1 = T 2)P (x1 = ki|y1 = T 2)×
P (y2 = T 3|y1 = T 2)P (x2 = f in|y2 = T 3)×
P (y3 = T 3|y2 = T 3)P (x3 = yeni|y3 = T 3)
=(1/4) ∗ 0.8 ∗ 0.05 ∗ 0.2 ∗ 0.05 ∗ 0.6
=6 × 10−5
3
Question 5: In the question above, what is the probability of following transition?
P (y3 = T 3|y1 = T 4, y2 = T 1, x1 = ki, x2 = f in)
1. 0
2. 0.32
3. 0.8
4. 0.0064
Answer: 3
Solution: Since it is Hidden Markov Model, hidden state y3 depends only on y2.
P (y3 = T 3|y1 = T 4, y2 = T 1, x1 = ki, x2 = f in) =P (y3 = T 3|y2 = T 1)

=0.8
Question 6: Consider the Markov Chain:
Figure 1: Example Markov Chain (at t = 0, P (ST ART ) = 1.0)
Which among the following sequences can NOT be generated by this Markov Chain?
4
1. START w1 w1 w2 w3 w3
2. START w1 w1 w1 w1 w1
3. START w1 w2 w3 w1
4. START w1 w1 w2 w3 w2 w1
Answer: 3, 4
Solution: Sequences that have probabilities higher than zero can be generated.
Question 7: Assign POS tag sequence to the sentence “ki fin yeni” using Maximum-
Entropy Markov Model with beam search (beam size=2). The features are given
below and all features have a weight of 1.0.
f1 = (ti = T 1 & ti−1 = T 1 or T 2)

f2 = (ti = T 2 & ti−1 = N U LL)
f3 = (ti = T 3 or T 4 & ti−1 = T 2)
f4 = (wi = yeni & ti = T 1 & ti−1 = T 3)
f5 = (wi = yeni & ti = T 4 & ti−1 = T 4)
f6 = (wi = f in & ti = T 4 & ti−1 = T 2)
Data for Q7 and Q8.: The POS tags allowed for the words used here are mentioned
below:
ki : {T 1, T 2}
f in : {T 2, T 3, T 4}
yeni : {T 1, T 4}
1. T2, T3, T1
2. T2, T2, T4
3. T1, T4, T4
4. T2, T4, T4
5
Answer: 4
Solution:
Question 8: In the previous question, what is the probability of the POS tag se-
quence found using beam-search?
1. 0.355
2. 0.486
3. 0.179
e
4. 1+e
Answer: 1
Solution:

Assignment 4 (Updated)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment 4 (Updated)

Uploaded by

Copyright:

Available Formats

Natural Language Processing

Number of Questions: 8 Total Marks: 8 × 1

1. Higher freedom of choosing features to represent observations in MEMMs than

2. In MEMMs, domain knowledge can be incorporated through feature design

1. Classifying images between dogs and cats.

2. Identifying POS tag for each word in a sentence.

3. Find the most relevant document given a query.

4. All of the above

Table 1: Output Symbol probabilities (Emission probabilities)

Table 2: Hidden State transition matrix

P (x1 = “ki”,x2 = “f in”, x3 = “yeni00 , y1 = “T 2”, y2 = “T 3”, y3 = “T 300 )

P (y3 = T 3|y1 = T 4, y2 = T 1, x1 = ki, x2 = f in)

P (y3 = T 3|y1 = T 4, y2 = T 1, x1 = ki, x2 = f in) =P (y3 = T 3|y2 = T 1)

Question 6: Consider the Markov Chain:

Figure 1: Example Markov Chain (at t = 0, P (ST ART ) = 1.0)

f1 = (ti = T 1 & ti−1 = T 1 or T 2)

You might also like