Professional Documents
Culture Documents
Ucs655 Test Paper
Ucs655 Test Paper
Ucs655 Test Paper
Instructions: Attempt any FIVE questions. Answer all parts of the question at same place.
Note: Only First Five answers will be considered.
Q1 (a) Given the following short movie reviews, each labeled with a genre, either 5
comedy, action or documentary: Marks
SI. Review Genre
No.
1. Fast, furious, shoot action
2. Fun, couple, love, love comedy
3. Couple, fly, fast, fun, fun comedy
4. Fly, fast, shoot, love action
5. Furious, shoot, shoot, fun action
Design a naïve Bayes with add-1 smoothing document classifier to assign a
genre to movie based on review of it.
Q2 (a) Train two models, multinomial naive Bayes and binarized naive Bayes, 7
both with add-1 smoothing, on the following document counts for key Marks
sentiment words, with positive (1) or negative (0) class assigned as noted.
Doc# "good" "poor" "great" Class
1 3 0 3 0
2 0 1 2 0
3 1 3 0 1
4 1 5 2 1
5 0 2 0 1
Use both naive Bayes models to assign a class (0 or 1) to this sentence:
"A good, good plot and great characters, but poor acting."
Do the two models agree or disagree?
Q3 (a) Consider the following training corpus and estimate the Bigram and Trigram 4
probability of the test sentence <s> students are from Thapar <Is>. Include Marks
<s> and </s> in your counts just like any other token.
Training corpus:
<s> I am from Thapar </s>
<s> I am a teacher </s>
<s> students are good and are from various cities </s>
<s> students from Thapar do engineerin•</s>
P.T.O.
Q3 (b) Compute the perplexity of the bigram model for the above test sentence. Also 2
state, which of the gram model (2-gram vs 3-gram) is better for the above Marks
corpus and why?
Q5 (a) Consider a corpus that has the words- old, older, highest, and lowest. The 5
frequency of these words is { "old": 7, "older": 3, "finest": 9, "lowest": 4 }. The Marks
</w> token at the end of each word is added to identify a word boundary. Apply
Byte Pair Encoding algorithm for text tokenization until k= 5 merges and
generate merge list.