Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Essentials of Machine Learning

Digital Assignment 1
Submission Date: 04-05-2020 07:00 to 11:00 AM

Instructions

1. The DA will be opened in quiz format for three hours btw 7.00 to 11.00 am.
2. Totally 10 questions (of attached sums) will appear in VITOL for evaluation.

1. The data regarding the production of wheat in tons (X) and the price of the kilo of flour in
pesetas (Y ) in the decade of the 80’s in Spain were:
Wheat 32 30 34 27 27 27 24 26 37 42
Production
Flour 27 32 29 42 44 42 52 47 32 27
Price

Given the linear regression equation Y=α + βx. What is the value of β?

2. Use Linear Regression, and generate the linear expression which predicts the price of the
house by accepting the size of the house.
House 800 600 300 850 750 550 350 700 400 900
Size
House 10 L 7L 2L 10.75L 9L 6L 4L 8.5L 5L 11L
Price

Fit the regression line using the method of least squares.

3. A study was done to compare ‘X’ energy drink commercials. Each participant was shown
the commercials, P and Q, in random order and asked to select the better one. There were
110 women and 100 men who participated in the study. Commercial A was selected by 55
women and by 57 men. Find the odds of selecting Commercial A for the men. Do the
same for the women.

4. Suppose we had a logistic regression model with three features that learned the following
bias and weights:
b = 1, w1 = 3, w2 = -2, w3 = 4
Further suppose the following feature values for a given example:
x1 = 1, x2 = 8, x3 = 4
What are the log odds and the logistic prediction values?

5. Consider the following data set given to draw the decision tree using ID3 and answer the
following questions:

Not Smelly Spotted Smooth Edible (Class)


Heavy
1 0 0 1 1
1 1 0 0 1
0 1 0 1 1
1 0 0 0 0
1 0 1 0 0
0 1 0 0 0
0 1 1 1 0
1 1 0 1 0

What is the Gain (NOT HEAVY)?

6. For dataset in Q5, Which attribute should you choose as the root of a decision tree?

7. For dataset in Q5, when the spotted value is 1, what is the class it belongs to?

8. Consider the training example for binary classification using CART and answer the
following questions:
Customer ID Gender Car Type Shirt Size Class
1 M Family Small 0
2 M Sports Medium 0
3 M Sports Medium 0
4 F Sports Small 0
5 F Sports Medium 0
6 M Family Large 1
7 M Family Medium 1
8 F Luxury Medium 1
9 F Luxury Large 1
10 F Luxury Small 1

Which attribute is better?

9. For dataset in Q8, Why Customer ID which has the lowest Gini Index should not be used
as the attribute test condition?
10. For dataset in Q8, Compute the Gini Index of Shirt size attribute.

11. Consider the confusion matrix given below and answer the following questions:

Predicted
Actual Negative Positive
Negative CAT 15 35
Positive DOG 40 10

a. Calculate the F – Measure score


b. Calculate Precision
c. Calculate Recall
d. Calculate Accuracy

12. Consider the confusion matrix given below and answer the following questions:
Predicted
Actual Negative Positive
Negative CAT 69 6
Positive DOG 5 20

a. Calculate the F – Measure score


b. Calculate Precision
c. Calculate Recall
d. Calculate Accuracy

13. Following are the results observed for clustering 6000 data points into 3 clusters: A, B
and C

What is the F1-Score with respect to cluster B?


14. The following table shows the midterm and final exam grades obtained for students in a
database course. What is the value of α and β?
Mid Term Exam Final Exam
65 69
80 78
89 80
88 77
50 24
73 72
55 76
85 98

15. Consider the training dataset which is given below for the decision tree problem.

Genre Critics-reviews Rating IMAX Likes


Comedy Thumbs-Up R False No
Comedy Thumbs-Up PG-13 True No
Drama Neutral R False No
Action Thumbs-Down PG-13 True No
Action Neutral R True No
Comedy Thumbs-Down PG-13 False Yes
Comedy Neutral PG-13 True Yes
Comedy Thumbs-Up R False Yes
Drama Thumbs-Down PG-13 True Yes
Drama Neutral R True No
Drama Thumbs-Up PG-13 False Yes
Action Neutral PG-13 False Yes
Action Thumbs-Down R False Yes
Action Neutral PG-13 False Yes

16. For dataset in Q 15, what is the information gain of the overall dataset?

17. For dataset in Q 15, which of the two attributes has the same gain ratio?
18. For dataset in Q 15, what is the splitting node of the dataset?

19. For the customer service data, the proportion of customers who would recommend the service
in the sample of customers is 0.84, what is the odd of recommending the service department?

20. Consider a following model for logistic regression: P (y =1|x, w)= g(w0 + w1x) where g(z) is
the logistic function. In the above equation the P (y =1|x; w) , viewed as a function of x, that
we can get by changing the parameters w. What would be the range of p in such case?

21. Consider the confusion matrix given below and calculate lift:
Predicted
Actual Negative Positive
Negative Reptile 30 20
Positive Mammals 60 40

22. Consider the training example for binary classification using CART and answer the following
two questions:
Past Open Trading Return
Positive Low High Up
Negative High Low Down
Positive Low High Up
Negative High High Up
Positive High High Up
Negative Low High Down
Positive Low Low Down

What is the Gini Index of Past?

23. For dataset in Q22, What is the splitting node?

You might also like