CT2 Set B

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 4

SRM Institute of Science and Technology

College of Engineering and Technology SETB


School of Computing
DEPARTMENT OF COMPUTING TECHNOLOGIES
SRM Nagar, Kattankulathur – 603203, Chengalpattu District, Tamilnadu
Academic Year: 2023 (ODD)
Test: CLAT-2(ANSWER KEY) Date: 02/11/2023
Course Code & Title: 18CSE359T & NATURAL LANGUAGE PROCESSING
Duration: 2 periods
Year & Sem: IV Year & VII Semester Max. Marks: 50 Marks

Course Articulation Matrix:


S.No. Course PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PO13 PO14 PO15
Outcome

1 CO1 3 3 - - - - - - - - - - - - -

2 CO2 3 2 - 3 - - - - - - - - - - -

3 CO3 3 2 2 3 - - - - - - - - - - -

4 CO4 3 3 2 2 - - - - - - - - - - -

5 CO5 3 2 2 2 - - - - - - - - - 2 -

Part - A
(10*1 = 10 Marks) Answer all Questions.
Q. No Questions Marks BL CO PO PI Code
1 Which of the following includes major tasks in NLP? 1 2 5 1 2.1.2
a. Automatic summarization
b. Discourse Analysis
c. Machine Translation
d. All the Above
2 Tf-idf is used in 1 2 5 1 2.1.3
a. Page ranking by search engines
b. Processing text for ML model
c. Both A and B
d. None
3 Which of the following is a kind of text summarization 1 1 5 1 2.2.2
a. Topic-based summarization
b. Extraction-based summarization
c. History-based summarization
d. All the above
4 Word2vec is used to 1 1 5 4 2.2.3
a. Generate vectors out of words
b. Represent a document numerically
c. Make a set of vocabularies
d. None
5 The bag of words approach _________ 1 2 5 4 2.2.3
a. Keeps word order, keeps multiplicity
b. Keeps word order, disregards word multiplicity
c. Disregards word multiplicity, disregards word
multiplicity
d. Disregards word order, keeps word multiplicity
6 Which is a model of measuring the incidence of known 1 1 6 4 1.3.1
words?
a. A low weight in TF-IDF
b. A high weight in TF-IDF
c. A corpus
d. A bag of words
7 Which is a high-term frequency and low document 1 2 6 4 2.1.3
frequency
a. A low weight in TF-IDF
b. A high weight in TF-IDF
c. A corpus
d. A bag of words
8 Which is the main Python package we use for NLP? 1 1 6 1 3.4.2
a. NLTK
b. NLP-LIB
c. Scikit learn
d. pyNLP
9 Which are included in the named entity recognition 1 2 6 1 2.2.2
a. currency
b. Time and dates
c. Nouns
d. All the above
10 Which are multiword sequences? 1 1 6 4 2.2.3
a. corpus
b. Ngrams
c. Stop words
d. Tokenization
Part B (4*5=20Marks) Answer all Questions
11 Draw the transfer architecture for Machine Translation 5 1 2 1 1.6.1
Sol:

Machine Translation (MT) is a domain of computational


linguistics that uses computer programs to translate text
or speech from one language to another with no human
involvement with the goal of relatively high accuracy,
low errors, and effective cost.

Machine Translation is a critical yet complex process as


there is currently a huge number of nuanced natural
languages in the world.

12 Describe the selection restriction in semantic 5 1 2 4 2.2.3


interpretation
Sol:
selection restrictions for understanding are necessary
when 1 a single lexical item in a syntactic context maps
to several meanings and thus which meaning is intended
is ambiguous given just the lexical item in a syntactic
configuration and 2 there are semantic restrictions that it
is possible to state on other
13 Why is semantic interpretation assumed to be a 5 2 3 4 2.2.3
compositional process
Sol:
By compositionality I mean that the meaning of the
whole is a systematic function of the meaning of the
parts; a semantic theory with compositionality accounts
for the relationship between the meaning of a sentence
and the meanings of its components.
14 Define the following 5 1 3 1 2.2.2
a. Vector space model
b. Term frequency
c. Inverse document frequency

Sol:

Vector space model:


The vector space model is an algebraic model that
represents objects (like text) as vectors. This makes it
easy to determine the similarity between words or the
relevance between a search query and a document.
Cosine similarity is often used to determine the
similarity between vectors.

Term Frequency:
Term frequency (TF) means how often a term occurs in
a document. In the context of natural language, terms
correspond to words or phrases.

Inverse document frequency:


Inverse Document Frequency (IDF) is a weight
indicating how commonly a word is used. The more
frequent its usage across documents, the lower its score.
The lower the score, the less important the word
becomes.

Part C (2*10= 20 Marks) Answer any two Questions


15 Can statistical techniques be used to perform the task of 10 1 2 1 1.6.1
machine translation? If so, explain in brief
Sol: Statistical machine translation (SMT) deals with
automatically mapping sentences in one human language
(for example, French) into another human language
(such as English). The first language is called the source
and the second language is called the target. This
process can be thought of as a stochastic process.
16 Analyze how statistical methods can be used in machine 10 3 3 1 1.7.1
translation
Sol: Statistical machine translation systems learn to
translate by analyzing the statistical relationships
between original texts and their existing human
translations. The most important components in
statistical machine translation are the translation model
and the language model.
17 Explain the process of multi-document summarization 10 4 3 4 1.7.1
Sol:
Multi-document summarization is a process of
representing a set of documents with a short piece of text
by capturing the relevant information and filtering out
the redundant information.

Course Outcome (CO) and Bloom’s Level (BL) Coverage in Questions

Question Paper Setter Approved by the Audit Professor/Course


Coordinator

You might also like