Professional Documents
Culture Documents
Delhi Public School Bangalore North
Delhi Public School Bangalore North
a. Data Privacy
b. AI access
c. AI Bias
d. Data Exploration
3. Which 4Ws canvas block thinks about the benefits which the stakeholders would get
from the solution?
a. What
b. Where
c. Why
d. Who
5. To make an Artificial Intelligent system which can predict the salary of an employee
based on his previous salaries, one has to feed the data of his previous salaries into
themachine. This is called
a. Training Data
b. Testing Data
c. Ethical Data
d. Raw Data
6. Choose the correct option
a. Unsupervised learning ->labelled dataset, Regression
b. Supervised learning -> labelled data set, Regression
c. Unsupervised learning ->unlabelled dataset, Classification
d. Supervised learning -> unlabelled data set, Regression
7. Sagar is collecting data from social media platforms. He collected a large amount of
data. but he wants specific information from it. Which NLP application would help
him?
a. Automatic Text Summarization
b. Text Classification
c. Sentiment Analysis
d. None of these
8. Which of the following is used for finding the frequency of words in some given text
sample?
a. Term frequency
b. Bag of words
c. Lemmatisation
d. Stemming
9. The term Stop words mean:
a. the whole corpus having many words
b. to undergo several steps to normalise the text to a lower level
c. in which each sentence is then further divided into tokens
d. the words which occur very frequently in the corpus but do not add any value
to it.
10. _______ is a process that involves removing the ends of the words irrespective of
the resultant word making sense.
a. Term frequency
b. Bag of words
c. Lemmatisation
d. Stemming
11. Entire text in all documents in a collection for NLP processing is called as _________
a. Segment
b. Library
c. Corpus
d. Data –Sets
a. Lemmatization
b. Stemming
Ans: a. Lemmatization –city
b. Stemming –citi
2. How many tokens are there in the sentence given below?
Traffic Jams have become a common part of our lives nowadays. Living in an urban area
means you have to face traffic each and every time you get out on the road. Mostly, school
students opt for buses to go to school.
Ans: 46 tokens are there in the given sentence
Ans:
Regression: These models work on continuous data to predict the output based on
patterns. For example, if you wish to predict your next salary, then you would
put in the data of your previous salary, any increments, etc., and would train the
model. Here, the data which has been fed to the machine is continuous. OR
Regression is the process of finding a model for distinguishing the data into
continuous real values instead of using discrete values. It can also identify the
distribution movement depending on the historical data.
Classification: The classification Model works on the labelled data. For example, we have 3
coins of different denomination which are labelled according to their weight
then the model would look for the labelled features for predicting the output.
This model works on discrete dataset which means the data need not be
continuous. OR In classification, data is categorized under different labels
according to some parameters given in input and then the labels are predicted
for the data.
7. Create a document vector table for the given corpus and also mention document frequency
of the given words in the dictionary:
Document 1: We are going to Mumbai
Document 2: Mumbai is a famous place.
Document 3: We are going to a famous place.
Document 4: I am famous in Mumbai.
ANS:
Step 1: Text Normalisation
Document 3 : [ ]
Document 4: [I , am , in]
Step 2 : Create Dictionary
is a famous place i
am in
TFIDF for
any word W becomes:
TFIDF(W) = TF(W) * log( IDF(W) )
Most occurred words are: Mumbai, famous
Step4: IDF
Si An Gi Ar e Tw in Sis Liv I Aus Wi H Au Ind ia Par
ta d ta t es n tr th er nt ents
er alia 3/1
s
3/2 3/1 3/2 3/1 3/1 3/1 3/2 3/2 3/1 3/2 3/2 3/1 3/1
II. Answer the following questions: