Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Silicon Institute of Technology

Silicon Hills, Bhubaneswar


| An Autonomous Institute |

6th Semester B.Tech. Mid Term Examination 2021-2022


NATURAL LANGUAGE PROCESSING(18CS2T50)
Duration: 01:30 Full Marks: 25

1 Answer All
a What is data sparseness problem in n-gram model. Present some solution to address it.
1
b Explain why ambiguity is an issue in processing natural languages.
1
c For a training corpus with 15,000 words, find the uni-gram probability of the terms with
1
respect to the below-given frequency counts:
Term: Frequency:
student 225
teacher 335
d Differentiate between rule-based and stochastic PoS tagging methods.
1
e Discuss some possible solutions to address unknown words in POS tagging.
1
f Explain the role of NER tagging in NLP application development.
1
2 Answer All
a Discuss the importance of language modeling and the types of available language models.
3
b Discuss any 3 NLP applications used over the internet.
3
c Write regular expressions for the following text searching operation in an NLP application:
3
(i) Any positive integer with an optional decimal point followed
Gigahertz/gigahertz/GHz/Ghz/ghz
(ii)Any email id format: silicon_2021@ymail.ac.in
(iii) All occurrences of the term 'students' at word boundary with optional 's'
3 Answer any One
a Find the probability of all trigram sequences in the training set. Also, find the probability of
5
the test sentence S5.
Training set:
S1: The section of all the intelligent students
S2: Students of the college
S3: The students of this college
S4: The intelligent students of this college
Test Sentence:
S5: The section of all the intelligent students of this college

[PTO]
b Find the probability of test sentence S5 if a bi-gram model is used on the following training
5
sentences (S1,S2,S3,S4)?
Training set:
S1: The major applications of information retrieval
S2: information retrieval and extraction
S3: The advanced applications of the information retrieval domain
S4: Applications of machine learning in information retrieval
Test Sentence:
S5: The advanced applications of machine learning in information retrieval
4 Answer any One
a Construct an FSA which will accept all possible strings of the form: "Last Monday on 2nd
5
Feb 2010"; "next Tuesday on 26th July 2006"; "previous Sunday on 13th Dec 2005"; etc in
the year range 2010-2020.
b Construct an FSA which will accept all possible pronunciation of numbers in the range 1 to
5
999 in string format ("one", "two",.... "nine hundred ninety nine") followed by the string
"Rupees" | "rupees" | "RS".

You might also like