Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Indian Institute of Technology Patna

CS244: Data Science

EN DSE M 26th April 2022


TI M E: 3 H O U R S Full Marks 50

Figure 2: Chi-square Table

tribution. z =(43.5 - 40)/2= 1.75. P(X >43.5) = P(Z


> 1.75) = 1 - P(Z < 1.75) = 1 - 0.9599 = 0.0401

3. The average height of females in the freshman class of


a certain college has historically been 162.5 centime-
ters with a standard deviation of 6.9 centimeters. Is
there reason to believe that there has been a change in
the average height if a random sample of 50 females
in the present freshman class has an average height of
165.2 centimeters? Consider value of alpha is 0.05.
State Null hypothesis and alternative hypothesis. Find
out the critical value and test statistics. [3]
The hypotheses are
Figure 1: Standard Normal Table H0 : µ = 162.5 centimeters,
H1 : µ != 162.5 centimeters.
Now, Z = (165.2-162.5)/(6.9/50) = 2.77. For α =0.05
and a two tailed test critical value is 1.96. So we
1. A lawyer commutes daily from his suburban home to
reject Null Hypothesis in favour of alternative hy-
his midtown office. The average time for a one-way
pothesis.
trip is 24 minutes, with a standard deviation of 3.8 min-
utes. Assume the distribution of trip times to be nor- 4. A manufacturer of car batteries claims that the life of
mally distributed. If the office opens at 9:00 A.M. and the company’s batteries is approximately normally dis-
the lawyer leaves his house at 8:40 A.M. daily, what tributed with a standard deviation equal to 0.9 year. If
percentage of the time he is late for work? [3] a random sample of 10 of these batteries has a standard
P(X> 20) = P(Z>((20-24)/3.8) = P(Z>-1.05). deviation of 1.2 years, do you think that σ > 0.9 year?
for Z=1.05 area is 0.8531. Use a 0.05 level of significance. [3]
P(Z>1.05)=1-0.8531= 0.1469 H0 : σ 2 = 0.81.
So P(Z<-1.05) = 0.1469 H1 : σ 2 > 0.81.
So P(Z>-1.05)=1-0.1469=0.8531 α = 0.05. Critical region: χ2 > 16.919,
Computations: s2 = 1.44 (as σ0 =1.2 given),
2. A certain machine makes electrical resistors having a n = 10, and χ2 =(9)(1.44)/0.81= 16.0
mean resistance of 40 ohms and a standard deviation of Decision: The χ2 -statistic is not significant at the 0.05
2 ohms. Assuming that the resistance follows a normal level.
distribution Find the percentage of resistances exceed-
ing 43 ohms resistance is measured to the nearest ohm. 5. Average zinc concentration recovered from a sample
[3] of zinc measurements in 36 locations of river is found
We assign a measurement of 43 ohms to all resis- to be 2.6 grams per milliliter. Find the 95% confidence
tors whose resistances are greater than 42.5 and less Intervals for the mean zinc concentration in the river.
than 43.5. We are actually approximating a discrete Assume that population standard deviation is 0.3. [3]
distribution by means of a continuous normal dis- Point estimate of µ is x̄ = 2.6.

1
Z value leaving an area of 0.025, is Z0.025 = 1.96 ‘−’) given the features A, B, and C.
Hence 95% confidence Interval is
–2.6 – 1.96*(0.3/6)< µ < 2.6 + 1.96*(0.3/6)
= –2.5 < µ < 2.7
6. Naive Bayes: Suppose we are given the following
dataset, where A, B, C are input binary random vari-
ables, and y is a binary output whose value we want to
predict. How would a naive Bayes classifier predict y
given this input: A = 0, B = 0, C = 1? Assume that in
case of a tie the classifier always prefers to predict 0
for y. [4] (A) First, consider building a decision tree by greed-
ily splitting according to information gain. (a) Which
features could be at the root of the resulting tree? (b)
How many edges are there in the longest path of the
resulting tree?
(B) Now, consider building a decision tree with the
smallest possible height. (a) Which features could
be at the root of the resulting tree? (b) How many
edges are there in the longest path of the resulting tree?
[2+2+2+1]
7. Deep Learning: Suppose you are given predictions of
n different experts (or, automated learners), whether 10. PageRank: Consider the following diagram that de-
a given email message is SPAM (1), or EMAIL (0). picts the connectivity among 4 web pages (nodes 1-
Your goal is to output a single prediction per message, 4). You need to compute the page-rank for each of the
that would be as accurate as possible. For this pur- node. Assume damping factor as 1. [Hints: Try to
pose, you’d like to implement a majority voting mech- avoid iterative method] [4]
anism. That is, if more than half of the experts predict
1 3
SPAM, than your final prediction should be SPAM for
that instance. Otherwise, the final prediction should
be EMAIL. (a) Suggest a neural network, that imple-
ments majority voting when there are 4 experts overall
(named A,B,C,D). Specify the network structure and
2 4
weights. (b) Explain shortly how to adapt the network
structure and weights to the general case of n experts.
[3+1] 11. SVM: Suppose we only have four training exam-
ples in two dimensions which are as follows. P1 =
8. Linear regression: We are interested here in a par- (0, 0), P2 = (2, 2), P3 = (h, 1), P4 = (0, 3) where
ticular 1-dimensional linear regression problem. The 0 ≤ h ≤ 3. The positive examples are P1 &P2 and
dataset corresponding to this problem has n examples the other two points are negative examples. (a) How
(x1 ; y1 ), . . . , (xn ; yn ) where xi and yi are real num- large can h ≥ 0 be so that the training points are still
bers for all i. Let w∗ = [w0∗ , w1∗ ]T be the least squares linearly separable? (b) What is the margin achieved by
solution. In other words, w∗ minimizes J(w) = the maximum margin boundary as a function of h? (c)
n
1X Assume that we can only observe the second compo-
(yi −w0 −w1 ×xi )2 . You can assume for our pur- nent of the input vectors. Without the other compo-
n
i=1
poses here that the solution is unique. Find the value nent, the labeled training points reduce to (0,+), (2,+),
of thePof the following expressions with justifications (1,−), and (3,−). What is the lowest order p of poly-
n
x nomial kernel that would allow us to correctly classify
[x̄ = in i ]
these points? [2+2+1]
(a) n1 ni=1 (yi − w0∗ − w1∗ xi )(xi − x̄)
P

(b) n1 ni=1 (yi − w0∗ − w1∗ xi )(w0∗ + w1∗ xi )


P
[2+2] 12. K-Means clustering: Consider performing K-Means
Clustering on a one-dimensional dataset containing
9. Decision Tree: You are given a dataset for training a four data points: 5, 7, 10, 12 using k = 2, Euclidean
decision tree. The goal is to predict the label (‘+’ or distance, and the initial cluster centers are c1 = 3.0

2
and c2 = 13.0. (a) What are the initial cluster assign-
ments? (That is, which examples are in cluster c1 and
which examples are in cluster c2 ?) (b) What are the
new cluster centers after making the assignments in
(a)? (c) State True or False: K-Means Clustering is
guaranteed to converge. [1+1+1]

13. Spectral clustering: Write the Laplacian matrix for


the following graph (Fig: SC) for spectral clustering.
All the edges have weight 1 (similarity measure). [2]

14. Linear algebra: Consider the following set of points


(x) (below figure (Fig: LA), rectangular region) are
transformed using the matrix A having one eigen-
value as 2 and the corresponding vector is e1 =
[0.707, −0.707]T . The other eigenvalue is 0. Draw
the plot for Ax. [2]

1 2

4 3

Fig: SC Fig: LA

You might also like