End Sem 21 22 Spring

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA

END - SEM EXAMINATION, 2022


SESSION: 2021 – 2022 (Spring)
M.Tech (2nd Semester), Ph.D and Executive Ph.D

Subject code: CS6218 Subject Name: Machine Learning Dept. Code: CS


No. of pages: 2 Full Marks: 50 Duration: 3 Hrs

Answer ALL questions and All parts of a question should be answered at one place.
Unnecessary writings will attract negative marks.
1. a). Consider the mean of a cluster of objects from a binary transaction data set. What are the
minimum and maximum values of the components of the mean? What is the interpretation of
components of the cluster mean? Which components most accurately characterize the objects in
the cluster?
b). Many partitional clustering algorithms that automatically determine the number of clusters
claim that this is an advantage. List two situations in which this is not the case.
c) Consider the following proximity matrix: Apply the single link algorithms to P and comment
on the resulting dendrograms. [4+2+4]

2. a). Following is a data set that contains two attributes, X and Y, and two class labels, “+” and
“−”. Each attribute can take three different values: 0, 1, or 2. The concept for the “+” class is Y
= 1 and the concept for the “−” class is X = 0 ∨ X = 2. A decision tree is built from the dataset.
The tree captures the “+” or /and “−” concepts?

What are the accuracy, precision, recall, and F1-measure of the decision tree? (Note that
precision, recall, and F1-measure are defined with respect to the “+” class.)
b). You are given a set of 𝑚 objects that is divided into 𝐾 groups, where the 𝑖 𝑡ℎ group is of size
𝑚𝑖 . If the goal is to obtain a sample of size 𝑛 < 𝑚, what is the difference between the following
two sampling schemes? (Assume sampling with replacement.)
𝑚
(i) We randomly select 𝑛 × 𝑖⁄𝑚 elements from each group.
(ii) We randomly select 𝑛 elements from the data set, without regard for the group to which an
object belongs.
c) Write short notes on feature selection algorithms: i) Sequential Forward Selection, ii)
Sequential Backward Selection. [4+3+3]
3). a). Suppose three states s1, s2, s3 are given as hidden unit in HMM. Find the optimal path for
generating the observation sequence < 𝑣0 𝑣1 𝑣3 𝑣2 > given model 𝜃. The transition probability
A from one state to another state and observation probability B of visible state are given as
follows:
𝑠0 𝑠1 𝑠2 𝑠3
𝑣0 𝑣1 𝑣2 𝑣3
𝑠0 1 0 0 0
𝑠0 1 0 0 0
A= 𝑠1 0.2 0.3 0.1 0.4 B=
𝑠1 0 0.3 0.4 0.3
𝑠2 0.2 0.5 0.2 0.1
𝑠2 0 0.2 0.1 0.7
𝑠3 0.8 0.1 0.0 0.1
𝑠3 0 0.5 0.4 0.1
Initial probabilities: say P(𝑠1 )=0.3 , P(𝑠2 )=0.2, P(𝑠3 )=0.1. Prior probability: [1 0 0]. 𝑠0 is the
initial state at t=0 and 𝑣0 is the initial observation sequence at time t=0.
b). If the self-transition probability of a state is 𝑃(𝑖|𝑖), then the probability of the model being at state
𝑑−1
𝑖 for 𝑑 successive stages is given by 𝑃𝑖 (𝑑) = (𝑃(𝑖|𝑖)) (1 − 𝑃(𝑖|𝑖)). Show that the average
1
duration for staying in state 𝑖 is equal to 𝑑̅ = 1−(𝑃(𝑖 |𝑖))

c). What are the probabilities/probability matrices used in an HMM model? Explain each of them
with their arithmetic expressions.
[4+3+3]
4. a). Explain the architecture of generalized radial basis function network. Construct an RBF
classifier such that:
(0,0) and (1,1) are mapped to 0, class C1
with t1  (1,1) and t2  (0,0)
(1,0) and (0,1) are mapped to 1, class C2
b). Write down the different layers of Probabilistic Neural Network (PNN). Let we have 2D dataset,
consist of 2 different class represented by different patterns 𝜔, and 𝜓. Samples belongs to class 𝜔
are (1,5) and (3, 2). Samples belongs to class 𝜓 are (7,9) (8,6), and (9,5) and let the smoothing
parameter is 𝜎 = 0.5. Let we have a testing sample (3, 5). Apply PNN algorithm to classify testing
sample in one of the two classes.
c). Define the advantages of using RBNN than the MLP [4+4+2]

5. a). Why cost function used for linear regression can not be used for logistic regression?
b). You are given the logistic regression algorithm for binary classification. Explain the process
to learn the coefficients for a logistic regression model using stochastic gradient descent
c). Given a set of five data points 𝑥1 = 2, 𝑥2 = 2.5, 𝑥3 = 3, 𝑥4 = 1 and 𝑥5 = 6, find Parzen probability
density function (pdf) estimates at 𝑥 = 3, using the Gaussian function with 𝜎 = 1 as window function.
Graphical illustration is required. [2+3+5]

**************************** ALL THE BEST *************************************

You might also like