Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Birla Institute of Technology & Science, Pilani

Work-Integrated Learning Programmes Division


First Semester 2023-2024

Comprehensive Examination
(EC-3 Regular)

Course No. : MBA ZG536


Course Title : Foundations of Data Science
Pattern of Exam : SCAN AND UPLOAD
Nature of Exam : Open Book
Weightage : 45% No. of Pages =6
Duration : 2 ½ Hours No. of Questions = 6
Date of Exam : 26/11/2023 (FN)
Note:
1. Please follow all the Instructions to Candidates given on the cover page of the answer book.
2. All parts of a question should be answered consecutively. Each answer should start from a fresh page.
3. Assumptions made if any, should be stated clearly at the beginning of your answer.

Q1. Set A

import pandas as pd

features = ['AGE', 'COUNTRY', 'PLAYING ROLE']

data_encoded = pd.get_dummies( data_df, columns = features, drop_first = True )

With reference to the above piece of code what kind to encoding is being carried out by the
function “get_dummies”. Explain the type of encoding in details and why this encoding is
required. (5 marks)

Q1. Set B

import pandas as pd

features = ['AGE', 'COUNTRY', 'PLAYING ROLE']

data_encoded = pd.get_dummies( data_df, columns = features, drop_first = True )

What is the significance of “drop_first = True” parameter in the above code, why is it
required? Explain in detail. (5 marks)

Q1. Set C

import pandas as pd

features = ['AGE', 'COUNTRY', 'PLAYING ROLE']

data_encoded = pd.get_dummies( data_df, columns = features, drop_first = True )


What is the significance of “drop_first = True” parameter in the above code, why is it
required? Explain in detail. (5 marks)

Q2. Set A
How are unknown datapoints classified using K Nearest Neighbour (KNN) algorithm? What
is the rule of thumb while fixing the value of K? (5 Marks)

Q2. Set B
What technique does K Nearest Neighbour (KNN) algorithm use to classify unknown
datapoints? What is the rule of thumb while fixing the value of K? (5 Marks)

Q2. Set C
Why is KNN algorithm is a type of lazy learning? Explain. What does “distance” refer to in
KNN algorithm, what are the different metrics that can be used? (5 marks)

Q3. Set A
Why is K-Means algorithm an unsupervised learning algorithm? How is the value of K
determined, explain? (5 Marks)

Q3. Set B
What is the problem when we use accuracy as a metric to evaluate a model built over a highly
imbalanced dataset? Explain. (5 Marks)

Q3. Set C
What is Hyperparameter tuning in Machine Learning? Explain how hyperparameter tuning is
carried out using python. (5 marks)

Q4 Set A.
Answer the following questions with respect to the following figure. (10 Marks)

a) What function is denoted by the following figure? [1 Mark]


b) In which Machine Learning algorithm is it used? [1 Mark]
c) Explain why this function is used to fit the data rather than a linear line/hyperplane?
[3 Marks]
d) Explain the concept of threshold, and a way of determining an optimal threshold. [5
Marks]

Q4. Set B.
Answer the following questions with respect to the following figure. (10 Marks)

a) Suppose the above figures shows decision boundaries for KNN and Logistic
regression model applied on a 2D dataset. Answer which decision boundary (A and
B) is for which algorithm (KNN and Logistic Regression). Explain why. [4 Marks]

b) What function is denoted by the following equation? In which machine learning


algorithm (KNN or Logistic regression) is it used? [1 Mark]

c) What would be the equation of the decision boundary for the ML algorithm referred
to in question b)? Explain [2 Marks]

d) With reference of question b), Explain why this function is used to fit the data rather
than a linear line/hyperplane? [3 Marks]

Q4. Set C.
Answer the following questions with respect to the following figure. (10 Marks)
a) What function is denoted by the following figure? [1 Mark]
b) In which Machine Learning algorithm is it used? [1 Mark]
c) Explain why this function is used to fit the data rather than a linear line/hyperplane?
[3 Marks]
d) Explain the concept of threshold, and a way of determining an optimal threshold. [5
Marks]

Q5. Set A.
Explain what is Entropy with respect to decision tree. Which of the following state has the
highest impurity? Calculate entropy for each of states in the figure. Note that the colours
represents classes of the data-points in the figure. (10 Marks)

Q5. Set B
Explain what is Entropy with respect to decision tree. Which of the following state has the
highest impurity? Calculate entropy for each of states in the figure. Note that the colours
represents classes of the data-points in the figure. (10 Marks)
Q5. Set C
Explain what is Entropy with respect to decision tree. Which of the following state has the
highest impurity? Calculate entropy for each of states in the figure. Note that the colours
represents classes of the data-points in the figure. (10 Marks)

Q6. Set A.
Give examples of use cases where each of the following metrics should be preferred over the
other for model evaluation (10 Marks)
1. Precision
2. Recall

Q6. Set B.
Describe the steps of k-means clustering algorithm? How is the value of k determined? (10
Marks)
Q6. Set C.
Give examples of use cases where each of the following metrics should be preferred over the
other for model evaluation (10 Marks)
1. Precision
2. Recall

You might also like