Professional Documents
Culture Documents
Essentials of ML, VITOL Course DA1
Essentials of ML, VITOL Course DA1
Digital Assignment 1
Submission Date: 04-05-2020 07:00 to 11:00 AM
Instructions
1. The DA will be opened in quiz format for three hours btw 7.00 to 11.00 am.
2. Totally 10 questions (of attached sums) will appear in VITOL for evaluation.
1. The data regarding the production of wheat in tons (X) and the price of the kilo of flour in
pesetas (Y ) in the decade of the 80’s in Spain were:
Wheat 32 30 34 27 27 27 24 26 37 42
Production
Flour 27 32 29 42 44 42 52 47 32 27
Price
Given the linear regression equation Y=α + βx. What is the value of β?
2. Use Linear Regression, and generate the linear expression which predicts the price of the
house by accepting the size of the house.
House 800 600 300 850 750 550 350 700 400 900
Size
House 10 L 7L 2L 10.75L 9L 6L 4L 8.5L 5L 11L
Price
3. A study was done to compare ‘X’ energy drink commercials. Each participant was shown
the commercials, P and Q, in random order and asked to select the better one. There were
110 women and 100 men who participated in the study. Commercial A was selected by 55
women and by 57 men. Find the odds of selecting Commercial A for the men. Do the
same for the women.
4. Suppose we had a logistic regression model with three features that learned the following
bias and weights:
b = 1, w1 = 3, w2 = -2, w3 = 4
Further suppose the following feature values for a given example:
x1 = 1, x2 = 8, x3 = 4
What are the log odds and the logistic prediction values?
5. Consider the following data set given to draw the decision tree using ID3 and answer the
following questions:
6. For dataset in Q5, Which attribute should you choose as the root of a decision tree?
7. For dataset in Q5, when the spotted value is 1, what is the class it belongs to?
8. Consider the training example for binary classification using CART and answer the
following questions:
Customer ID Gender Car Type Shirt Size Class
1 M Family Small 0
2 M Sports Medium 0
3 M Sports Medium 0
4 F Sports Small 0
5 F Sports Medium 0
6 M Family Large 1
7 M Family Medium 1
8 F Luxury Medium 1
9 F Luxury Large 1
10 F Luxury Small 1
9. For dataset in Q8, Why Customer ID which has the lowest Gini Index should not be used
as the attribute test condition?
10. For dataset in Q8, Compute the Gini Index of Shirt size attribute.
11. Consider the confusion matrix given below and answer the following questions:
Predicted
Actual Negative Positive
Negative CAT 15 35
Positive DOG 40 10
12. Consider the confusion matrix given below and answer the following questions:
Predicted
Actual Negative Positive
Negative CAT 69 6
Positive DOG 5 20
13. Following are the results observed for clustering 6000 data points into 3 clusters: A, B
and C
15. Consider the training dataset which is given below for the decision tree problem.
16. For dataset in Q 15, what is the information gain of the overall dataset?
17. For dataset in Q 15, which of the two attributes has the same gain ratio?
18. For dataset in Q 15, what is the splitting node of the dataset?
19. For the customer service data, the proportion of customers who would recommend the service
in the sample of customers is 0.84, what is the odd of recommending the service department?
20. Consider a following model for logistic regression: P (y =1|x, w)= g(w0 + w1x) where g(z) is
the logistic function. In the above equation the P (y =1|x; w) , viewed as a function of x, that
we can get by changing the parameters w. What would be the range of p in such case?
21. Consider the confusion matrix given below and calculate lift:
Predicted
Actual Negative Positive
Negative Reptile 30 20
Positive Mammals 60 40
22. Consider the training example for binary classification using CART and answer the following
two questions:
Past Open Trading Return
Positive Low High Up
Negative High Low Down
Positive Low High Up
Negative High High Up
Positive High High Up
Negative Low High Down
Positive Low Low Down