Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

DELHI PUBLIC SCHOOL BANGALORE NORTH

QUATERLY EXAMINATION WORKSHEET ANSWER KEY 2023-24


SUBJECT: ARTIFICIAL INTELLIGENCE
CLASS: X

1. Amazon had been working on a secret AI recruiting tool. The machine-learning


specialists uncovered a big problem: their new recruiting engine did not like women.
The system taught itself that male candidates were preferable. It penalized resumes
that included the word “women”. This led to the failure of the tool. This is an example
of

a. Data Privacy
b. AI access
c. AI Bias
d. Data Exploration

2. ________ is the process of understanding the reliability of any AI model, based


on outputs by feeding test dataset into the model and comparing with actual
answers.
a. Exploration
b. Acquisition
c. Evaluation
d. Problem Scoping

3. Which 4Ws canvas block thinks about the benefits which the stakeholders would get
from the solution?
a. What
b. Where
c. Why
d. Who

4. __________ is not the part of learning-based approach.


a. Supervised Learning
b. Unsupervised Learning
c. Rule-based
d. Reinforcement Learning

5. To make an Artificial Intelligent system which can predict the salary of an employee
based on his previous salaries, one has to feed the data of his previous salaries into
themachine. This is called
a. Training Data
b. Testing Data
c. Ethical Data
d. Raw Data
6. Choose the correct option
a. Unsupervised learning ->labelled dataset, Regression
b. Supervised learning -> labelled data set, Regression
c. Unsupervised learning ->unlabelled dataset, Classification
d. Supervised learning -> unlabelled data set, Regression

7. Sagar is collecting data from social media platforms. He collected a large amount of
data. but he wants specific information from it. Which NLP application would help
him?
a. Automatic Text Summarization
b. Text Classification
c. Sentiment Analysis
d. None of these
8. Which of the following is used for finding the frequency of words in some given text
sample?
a. Term frequency
b. Bag of words
c. Lemmatisation
d. Stemming
9. The term Stop words mean:
a. the whole corpus having many words
b. to undergo several steps to normalise the text to a lower level
c. in which each sentence is then further divided into tokens
d. the words which occur very frequently in the corpus but do not add any value
to it.

10. _______ is a process that involves removing the ends of the words irrespective of
the resultant word making sense.
a. Term frequency
b. Bag of words
c. Lemmatisation
d. Stemming
11. Entire text in all documents in a collection for NLP processing is called as _________

a. Segment
b. Library
c. Corpus
d. Data –Sets

I. Answer the following questions

1. What will be the output of the word “cities if we do the following:

a. Lemmatization
b. Stemming
Ans: a. Lemmatization –city
b. Stemming –citi
2. How many tokens are there in the sentence given below?

Traffic Jams have become a common part of our lives nowadays. Living in an urban area
means you have to face traffic each and every time you get out on the road. Mostly, school
students opt for buses to go to school.
Ans: 46 tokens are there in the given sentence

3. Identify any 2 stopwords in the given sentence:


Pollution is the introduction of contaminants into the natural environment that cause
adverse change.The three types of pollution are air pollution, water pollution and land
pollution.
Ans: Stopwords in the given sentence are: is, the, of, that, into, are, and

4. Define Text Normalisation


Ans:It helps in cleaning up the textual data in such a way that it comes down to a
level where its complexity is lower than the actual data.
5. Explain with reference to TFIDF:
a) Keyword Extraction b) Stopword Filtering
Ans: a) The keyword extraction is one of the most required text mining tasks: given a
document, the extraction algorithm should identify a set of terms that best
describe its argument.
b) There are words in a document, that occur many times but may not be important; in
English, these are probably words like “the”, “is”, “of”, and so forth. We might take the
approach of adding words like these to a list of stop words and removing them before
analysis, but it is possible that some of these words might be more important in some
documents than others. A list of stop words is not a very sophisticated approach to
adjusting term frequency for commonly used words.

6. What are the different models in supervised learning?

Ans:

Regression: These models work on continuous data to predict the output based on
patterns. For example, if you wish to predict your next salary, then you would
put in the data of your previous salary, any increments, etc., and would train the
model. Here, the data which has been fed to the machine is continuous. OR
Regression is the process of finding a model for distinguishing the data into
continuous real values instead of using discrete values. It can also identify the
distribution movement depending on the historical data.

Classification: The classification Model works on the labelled data. For example, we have 3
coins of different denomination which are labelled according to their weight
then the model would look for the labelled features for predicting the output.
This model works on discrete dataset which means the data need not be
continuous. OR In classification, data is categorized under different labels
according to some parameters given in input and then the labels are predicted
for the data.
7. Create a document vector table for the given corpus and also mention document frequency
of the given words in the dictionary:
Document 1: We are going to Mumbai
Document 2: Mumbai is a famous place.
Document 3: We are going to a famous place.
Document 4: I am famous in Mumbai.
ANS:
Step 1: Text Normalisation

Document 1 : [We, are, going, to, Mumbai]

Document 2 : [is , a , famous , place]

Document 3 : [ ]

Document 4: [I , am , in]
Step 2 : Create Dictionary

we are going to mumbai

is a famous place i

am in

Step 3 : Create Document Vector Table


we are going to mumbai is a famous place i am in
D1: 1 1 1 1 1 0 0 0 0 0 0 0
D2: 0 0 0 0 1 1 1 1 1 0 0 0
D3: 1 1 1 1 0 0 1 1 1 0 0 0
D4: 0 0 0 0 1 0 0 1 0 1 1 1
Set 4: Doc frequency
we are going to mumbai is a famous place i am in
2 2 2 2 3 2 3 2 1 1 1 1
Step 4 :Create Inverse Document Frequency Table

Step 5: Inverse document freq


we are going to mumbai is a famous place i am in
4/2 4/2 4/ 2 4/2 4/ 3 4/1 4/2 4/ 3 4/ 2 4/1 4/1 4/ 1

TFIDF for
any word W becomes:
TFIDF(W) = TF(W) * log( IDF(W) )
Most occurred words are: Mumbai, famous

8. Create a document vector table for the given corpus:

Document 1 : Sita and Gita are twin sisters.


Document 2 : Sita lives in Australia with her aunt.
Document 3 : Gita lives in India with her parents.
Ans: Step 1: Text Normalisation

Document 1 : [Sita, and, Gita, are, twin, sisters]


Document 2 : [lives, in Australia, with, her, aunt]
Document 3 : [India, parents]
Step 2 : Create Dictionary

sita and gita are twin

sisters lives in australia with

her aunt india parents

Step 3: Create a Document Vector

Si An Gi Ar Tw Sist Liv I Austr Wi H Au In Pare


ta d ta e in ers es n alia th er nt d nts
ia
1 1 1 1 1 1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 1 1 1 1 1 1 0 0
0 0 1 0 0 0 1 1 0 1 1 0 1 1
Step 3: Create a Document frequency Vector

Si An Gi Ar Tw Sist Liv I Austr Wi H Au Ind Par


ta d ta e in ers es n alia th er nt ia ent
s
2 1 2 1 1 1 2 2 1 2 2 1 1 1

Step4: IDF
Si An Gi Ar e Tw in Sis Liv I Aus Wi H Au Ind ia Par
ta d ta t es n tr th er nt ents
er alia 3/1
s
3/2 3/1 3/2 3/1 3/1 3/1 3/2 3/2 3/1 3/2 3/2 3/1 3/1
II. Answer the following questions:

1. Consider the following corpus:


This is my first AI project. I used the most advanced techniques.
Perform the following on the sentence and write the output
a. Sentence Segmentation
1. This is my first AI project.
2. I used the most advanced techniques.
b. Tokenization
This, is, my, first, AI, project, I, Used, the, most, advanced, techniques
c. Remove stop words ,numbers and special Characters
My, first, AI, project, used, most, advanced, techniques
d. Convert text to common case
my, first, ai, project, used, most, advanced, techniques

2. Explain all 4Ws in details for 4Ws canvas.


a. Who
i. Under this w, the stakeholders and related things can be explored
ii. Stakeholders are the people who are facing the problem and can be
benefited after the solution
iii. Here two questions are very important –
1. Who are the stakeholders?
2. What do you know about them?
b. What
i. Here you need to look into the problem and understand what is the
problem
ii. How do you know that it is the problem
c. Where
i. This question focuses on the context/situation/location
d. Why
i. Why canvas focuses on the solutions and benefits to the stakeholders
from the solutions.
3. Create a 4W Project Canvas for the following:
Most senior citizens are not as tech-savy as the younger generations. And so, for
them searching for information on the internet through conventional means like
PCs or smartphones might seem like a difficult task. Nevertheless, information or
knowledge should be available for their consumption as well, wherein they can
simply speak what they want to know about and relevant information is provided
instantly.
Ans:
4. What are the steps of text Normalization? Explain them in brief.

Text Normalization in Text Normalization, we undergo several steps to normalize the


text to a lower level.
Sentence Segmentation - Under sentence segmentation, the whole corpus is divided
into sentences. Each sentence is taken as a different data so now the whole corpus
gets reduced to sentences.
Tokenisation- After segmenting the sentences, each sentence is then further divided
into tokens. Tokens is a term used for any word or number or special character
occurring in a sentence. Under tokenisation, every word, number and special
character is considered separately and each of them is now a separate token.
Removing Stop words, Special Characters and Numbers - In this step, the tokens
which are not necessary are removed from the token list.
Converting text to a common case -After the stop words removal, we convert the
whole text into a similar case, preferably lower case. This ensures that the case-
sensitivity of the machine does not consider same words as different just because of
different cases.
Stemming -In this step, the remaining words are reduced to their root words. In
other words, stemming is the process in which the affixes of words are removed and
the words are converted to their base form.
Lemmatization -in lemmatization, the word we get after affix removal (also known
as lemma) is a meaningful one.
With this we have normalized our text to tokens which are the simplest form of
words present in the corpus. Now it is time to convert the tokens into numbers. For
this, we would use the Bag of Words algorithm

You might also like