Professional Documents
Culture Documents
CST294 - Ktu Qbank
CST294 - Ktu Qbank
Preamble: This is the foundational course for awarding B. Tech. Honours in Computer
Science and Engineering with specialization in Machine Learning. The purpose of this course
is to introduce mathematical foundations of basic Machine Learning concepts among learners, on
which Machine Learning systems are built. This course covers Linear Algebra, Vector Calculus,
Probability and Distributions, Optimization and Machine Learning problems. Concepts in this course
help the learners to understand the mathematical principles in Machine Learning and aid in the
creation of new Machine Learning solutions, understand & debug existing ones, and learn about the
inherent assumptions & limitations of the current methodologies.
Course Outcomes: After the completion of the course the student will be able to
Make use of the concepts, rules and results about linear equations, matrix algebra,
CO 1 vector spaces, eigenvalues & eigenvectors and orthogonality & diagonalization to
solve computational problems (Cognitive Knowledge Level: Apply)
PO 1 PO 2 PO 3 PO 4 PO 5 PO 6 PO 7 PO 8 PO 9 PO 10 PO 11 PO 12
CO 1 √ √ √ √ √
CO 2 √ √ √ √
CO 3 √ √ √ √ √
CO 4 √ √ √ √ √ √
CO 5 √ √ √ √ √ √ √ √
COMPUTER SCIENCE AND ENGINEERING
Assessment Pattern
Analyse
Evaluate
Create
Mark Distribution
Attendance : 10 marks
First Internal Examination shall be preferably conducted after completing the first half of the
syllabus and the Second Internal Examination shall be preferably conducted after completing
remaining part of the syllabus.
There will be two parts: Part A and Part B. Part A contains 5 questions (preferably, 2
questions each from the completed modules and 1 question from the partly covered module),
having 3 marks for each question adding up to 15 marks for part A. Students should answer
all questions from Part A. Part B contains 7 questions (preferably, 3 questions each from the
completed modules and 1 question from the partly covered module), each with 7 marks. Out
of the 7 questions in Part B, a student should answer any 5.
End Semester Examination Pattern: There will be two parts; Part A and Part B. Part A
contains 10 questions with 2 questions from each module, having 3 marks for each question.
Students should answer all questions. Part B contains 2 questions from each module of which
student should answer anyone. Each question can have maximum 2 sub-divisions and carries
14 marks.
COMPUTER SCIENCE AND ENGINEERING
Syllabus
Module 1
Module 2
Module 3
Module 4
Module 5
Text book:
1.Mathematics for Machine Learning by Marc Peter Deisenroth, A. Aldo Faisal, and
Cheng Soon Ong published by Cambridge University Press (freely available at https://
mml - book.github.io)
Reference books:
4. 4.A set
A set
of of n linearly
n linearly independent
independent vectors
vectors in R inn R n
formsforms a basis.
a basis. Does
Does thethe
set set of vectors
of vectors (2, (2, 4,−3) ,
n
4. 4.4.
A setA set
A(0,
set1,
of of n linearly
noflinearly
1)n, linearly independent
independent
independent
(0, 1,−1) vectors
vectors
vectors
form a basis 3 in R forms a basis. Does the set of vectors (2, 4,−3) ,
n n
forinRR?inExplain
R3 forms
forms a basis.
a your
basis. DoesDoes
reasons. the of
the set setvectors
of vectors (2, 4,−3)
(2, 4,−3) , ,
4,−3) , (0, 1, 1) , (0, 1,−1) form a basis for R ? Explain your reasons.
(0, 1, (0,
(0,1)1,1, 1),1,−1)
, 1)
(0, ,(0,
(0,1,−1)
1,−1) form
formform aabasis
a basisbasis
for R for
3
for?R R33??Explain
ExplainExplainyouryour your reasons.
reasons.
reasons.
5. 5.Consider the transformation T (x, y) = (x +
Consider the transformation T (x, y) = (x + y, x + 2y, 2x y, x + 2y, 2x + 3y). Obtain
+ 3y). kerker
Obtain T and useuse this to
T and
5. 5.5. this
Consider
Consider
Consider the
the
to calculate
calculate transformation
the transformation
thetransformation
the nullity.
nullity. Also (x,TTy)
TAlso
find (x,
(x,=y)
find
the y)
(x ==+(x
the (x
y, +x+ y,+y, x2y,
transformation
transformation x ++2x2y,+2x
2y,
matrix 2x
3y).+ 3y).
+for
matrix 3y). Obtain
Obtain
T. Tker
T. ker ker
forObtain TT and
and and use
use use this
this this to
to to
calculate
calculate
calculate thenullity.
nullity.
the nullity.
the Also
AlsoAlso find
findfind thetransformation
transformation
the transformation
the matrix matrix
for T.
matrix forT.
for T.
6. 6.Find
Find
the the characteristic
characteristic equation,
equation, eigenvalues,
eigenvalues, and eigenspaces
and eigenspaces corresponding
corresponding to each to each
6. 6.6. Find
Find Find the
the the
eigenvalue characteristic
characteristic equation,
equation,
characteristic eigenvalues,
eigenvalues,
equation,
of the following matrix and
and and
eigenvalues, eigenspaces
eigenspaces corresponding
corresponding
eigenspaces to each
to each
corresponding to each
eigenvalue of the following matrix
eigenvalue
eigenvalue offollowing
of the
eigenvalue of thefollowing
the following matrix
matrix
matrix
"
7. Diagonalize the following matrix, if possible
7. 7.7. Diagonalize
Diagonalize thefollowing
following
the following
Diagonalize the matrix,
matrix, ififpossible
possible
if possible
matrix,
COMPUTER SCIENCE AND ENGINEERING
7. Diagonalize the following matrix, if possible
"
1. 1. For
Fora scalar
a scalar function
function f(x,f(x, x2 +3y
y, zy,) z=) x=2 +3y
2
2 +2z+2z
2
, find
2, find the the gradient
gradient andand its magnitude
its magnitude at at the
1.thepoint
For
(1, 2, -1).
a scalar
point function f(x, y, z ) = x2 +3y2 +2z2, find the gradient and its magnitude at the
(1, 2, -1).
pointthe
2. 2. Find
Find (1, maximum
2, -1). andminimum
minimumvalues
values of
of the
the function f(x, y) 2 2
the maximum and y) == 4x
4x++4y4y- x- x-2 y- ysubject
2 to
2 2
the
Findcondition
2.subject to the
the x + y and
<=
condition
maximum x2 2.
+ y2 <= 2. values of the function f(x, y) = 4x + 4y - x 2 - y2 subject to
minimum
2 2
the condition
3. 3. Suppose x +trying
y <= 2. f(x, y) y)
= x2=+ x
2y2 + 2y2. Along
Supposeyou
youwere to to
were trying minimize
minimize f(x, + 2y + 2y2. what vector
Along what vector
3.should you
should
Suppose
travel
you from (5,(5,
travel
you werefrom
12)?
trying 12)?
to minimize f(x, y) = x2+ 2y + 2y2. Along what vector
4. should you travel from (5, 12)?
4. Find thethe
second order Taylor series expansion forfor
f(x, y) y)
= (x + y)
+ y)about (0 (0
, 0).
2 2
Find second order Taylor series expansion f(x, = (x about , 0).
5.
4.Find thethe
critical points
orderofTaylor
f(x, y)series
= x expansion
3xy+5x-2y+6y +8.y) = (x + y)2 about (0 , 0).
2– 2
Find second for f(x,
5. Find the critical points of f(x, y) = x2 – 3xy+5x-2y+6y2+8.
6. Compute the gradient of the Rectified Linear Unit (ReLU) function ReLU(z) =
5. Find the critical points of f(x, y) = x2 – 3xy+5x-2y+6y2+8.
6. Compute the gradient of the Rectified Linear Unit (ReLU) function ReLU(z) = max(0 , z).
max(0 , z).
7. 6.LetCompute
LL ==||Ax
the gradient of the Rectified Linear Unit (ReLU) function ReLU(z) = max(0 , z).
||Ax- b||
- b||2,22where
, whereAAisisa amatrix
matrixand
andx xand
andb bare
arevectors.
vectors.Derive
DerivedL
dLininterms
termsofof dx.
2
7. Let
7.dx.Let L = ||Ax - b||22, where A is a matrix and x and b are vectors. Derive dL in terms of dx.
Course Outcome 3 (CO3):
Course Outcome 3 (CO3):
1. Let J and T be independent events, where P(J)=0.4 and P(T)=0.7.
i. Find P(J∩T)
1. Let J and T be independent events, where P(J)=0.4 and P(T)=0.7.
ii. Find P(J∪T)
i. Find P(J∩T)
iii. Find P(J∩T′)
ii. Find P(J∪T)
iii. Find P(J∩T′)
COMPUTER SCIENCE AND ENGINEERING
Course Outcome 3 (CO3):
i. Find P(J∩T)
i.
i. Given that E(R)=2.85, find a and b.
i. Given
ii. that E(R)=2.85,
Find P(R>2). find a and b.
ii. Find P(R>2).
4. A biased coin (with probability of obtaining a head equal to p > 0) is tossed repeatedly and
4. A biasedindependently
coin (with probability
until the of obtaining
first head isaobserved.
head equalCompute
to p > 0)the
is tossed repeatedly
probability that the first head
and independently
appears at anuntil
eventhenumbered
first headtoss.
is observed. Compute the probability that the
first head appears at an even numbered toss.
5. Two players A and B are competing at a trivia quiz game involving a series of questions. On
5. Two players A and B question,
are competing at a triviathat
quiz game involving a series of are p and q
any individual the probabilities A and B give the correct answer
questions. On any individual
respectively, question,with
for all questions, the outcomes
probabilities
for that A andquestions
different B give the correct
being independent. The
answer gameare p finishes
and q respectively,
when a player for wins
all questions, with outcomes
by answering a questionforcorrectly.
differentCompute the
questions being independent.
probability that A winsThe if game finishes when a player wins by answering a
i. A answers
question correctly. Compute thethe
first question, that A wins if
probability
ii. B answers the first question.
i. A answers the first question,
6. A coin for which P(heads) = p is tossed until two successive tails are obtained. Find the
ii. Bprobability
answers the first
that thequestion.
experiment is completed on the nth toss.
6. A coin for which P(heads) = p is tossed until two successive tails are obtained. Find
7. You roll a fair dice twice. Let the random variable X be the product of the outcomes of the
the probability that the experiment is completed on the nth toss.
two rolls. What is the probability mass function of X? What are the expected value and the
7. You rollstandard
a fair dice twice. Let the random variable X be the product of the outcomes of
deviation of X?
the two rolls. What is the probability mass function of X? What are the expected value
and 8. While watching
the standard deviationaofgameX? of Cricket, you observe someone who is clearly supporting
Mumbai Indians. What is the probability that they were actually born within 25KM of
Mumbai? Assume that:
• the probability that a randomly selected person is born within 25KM of Mumbai is
1/20;
• the chance that a person born within 25KMs of Mumbai actually supports MI is
7/10 ;
• the probability that a person not born within 25KM of Mumbai supports MI with
probability 1/10.
5. Consider
5. Consider the
the update
update equation
equation for
for stochastic
stochastic gradient
gradient descent.
descent. Write
Write down
down the
the update
update when
when
we use
we use aa mini-batch
mini-batch size
size of
of one.
one.
COMPUTER SCIENCE AND ENGINEERING
6. Consider
6.
6. Consider the
Consider the function
the function
function
"
"
9. 9.
Solve the following
Solve LP problem
the following withwith
LP problem the simplex method.
the simplex method.
9. Solve the following LP problem with the simplex method.
"
subject
subject to toto constraints
the the constraints
subject the constraints
Course
Course Outcome
Outcome 5 (CO5):
5 (CO5):
Course Outcome 5 (CO5):
1. What is a loss function? Give examples.
1. What is a loss function? Give examples.
2.1. What is
area loss
training/validation/test sets? What is cross-validation? Name one or two
function? Give examples.
2. What areoftraining/validation/test
examples sets? What is cross-validation? Name one or two examples
cross-validation methods.
2. What are training/validation/test sets? What is cross-validation? Name one or two examples
of cross-validation methods.
3. of cross-validation
Explain methods.
generalization, overfitting, model selection, kernel trick, Bayesian learning
3. Explain generalization, overfitting, model selection, kernel trick, Bayesian learning
3. Explain generalization, overfitting, model selection, kernel trick, Bayesian learning
4. Distinguish between Maximum Likelihood Estimation (MLE) and Maximum A Posteriori
4. Distinguish between Maximum Likelihood Estimation (MLE) and Maximum A Posteriori
Estimation (MAP)?
Estimation (MAP)?
5. What is the link between structural risk minimization and regularization?
5. What is the link between structural risk minimization and regularization?
6. What is a kernel? What is a dot product? Give examples of kernels that are valid dot
6. What is a kernel? What is a dot product? Give examples of kernels that are valid dot
products.
products.
Course Outcome 5 (CO5):
4.2. Distinguish
What are training/validation/test
between Maximum sets? What isEstimation
Likelihood cross-validation?
(MLE) Name one or two A
and Maximum examples
of cross-validation
Posteriori Estimationmethods.
(MAP)?
5.3. What
Explain generalization,
is the link between overfitting, model
structural risk selection, and
minimization kernel trick, Bayesian learning
regularization?
6.4. What
Distinguish between
is a kernel? WhatMaximum Likelihood
is a dot product? GiveEstimation
examples (MLE) andthat
of kernels Maximum
are validAdot
Posteriori
Estimation (MAP)?
products.
7.5. What
Whatisisridge
the link between How
regression? structural risktrain
can one minimization and regularization?
a ridge regression linear model?
8.6. What
What isis Principal
a kernel?Component
What is a Analysis
dot product?
(PCA)?GiveWhich
examples
eigenof value
kernels that arethevalid dot
indicates
products.of largest variance? In what sense is the representation obtained from a
direction
7. projection
What is ridge ontoregression?
the eigenHow can onecorresponding
directions train a ridge regression
the the linear
largestmodel?
eigen values
optimal for data reconstruction?
8. What is Principal Component Analysis (PCA)? Which eigen value indicates the direction of
largest variance?
9. Suppose that you In what
have sense is
a linear the representation
support vector machine obtained
(SVM) from a projection
binary classifier.onto the
eigen directions
Consider a pointcorresponding the the
that is currently largest eigen
classified valuesand
correctly, optimal
is farforaway
data reconstruction?
from the
9. decision
Supposeboundary. If you
that you have remove
a linear the point
support vectorfrom the training
machine (SVM) set, andclassifier.
binary re-train the
Consider a
classifier, will the decision boundary change or stay the same? Explain your answer
point that is currently classified correctly, and is far away from the decision boundary. If you
inremove
one sentence.
the point from the training set, and re-train the classifier, will the decision boundary
change or stay the same? Explain your answer in one sentence.
10. Suppose you have n independent and identically distributed (i.i.d) sample data points
10. xSuppose
1, ... , xnyou havedata
. These n independent
points come andfrom
identically distributed
a distribution (i.i.d)
where thesample of a x1, ... ,
data points
probability
xn. These
given data points
datapoint x is come from a distribution where the probability of a given datapoint x is
"
i. What are the prior and posterior odds for the fair coin?
ii. What are the prior and posterior predictive probabilities of heads on the next
flip? Here prior predictive means prior to considering the data of the first four
flips.
11. Suppose the data set y1,...,yn is a drawn from a random sample consisting of i.i.d. discrete
uniform distributions with range 1 to N. Find the maximum likelihood estimate of N.
12. Ram has two coins: one fair coin and one biased coin which lands heads with probability
3/4. He picks one coin at random (50-50) and flips it repeatedly until he gets a tails. Given
COMPUTER SCIENCE AND ENGINEERING
that he observes 3 heads before the first tails, find the posterior probability that he picked
each coin.
i. What are the prior and posterior odds for the fair coin?
ii. What are the prior Model Question
and posterior paper
predictive probabilities of heads on the next flip?
Here: prior predictive means prior to considering the data of the first four flips.
QP Code Total Pages: 4
Reg No.:_______________ Name:__________________________
Model
APJ ABDUL KALAM Question paperUNIVERSITY
TECHNOLOGICAL
IV SEMESTER B.TECH (HONOURS) DEGREE EXAMINATION, MONTH and YEAR
QP Code : Total Pages: 4
Course Code: CST 294
Reg No.:_______________ Name:__________________________
Course Name: COMPUTATIONAL FUNDAMENTALS FOR MACHINE
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
LEARNING
IV SEMESTER B.TECH (HONOURS) DEGREE EXAMINATION, MONTH and YEAR
Max. Marks: 100 Duration: 3 Hours
Course Code: CST 274
Course Name: COMPUTATIONAL FUNDAMENTALS PART A FOR MACHINE LEARNING
Max. Marks: 100 Answer all questions, each carries 3 marks. Marks
Duration: 3 Hours
PART A
1 Show that withAnswer
the usual operation each
all questions, of scalar multiplication
carries 3 marks. but with Marks
1 Show that on
addition with the given
reals usual by
operation
x # y = of
2(xscalar
+ y) ismultiplication but with addition on
not a vector space.
reals given by x # y = 2(x + y) is not a vector space.
2 Find the eigenvalues of the following matrix in terms of k. Can you find
2 Find the eigenvalues of the following matrix in terms of k. Can you find an
an eigenvector corresponding to each of the eigenvalues?
eigenvector corresponding to each of the eigenvalues?
8
one over the other.
Briey explain the difference between (batch) gradient descent and stochastic
8 Briey explain the difference between (batch) gradient descent and stochastic
9 What isdescent.
gradient the empirical
Give an risk? What of
example is “empirical risk minimization”?
when you might prefer one over the other.
910gradient descent.
What Give
is thethe an
empirical example
risk? of
Whatwhen you might
is “empirical prefer
risk one over the other.
minimization”?
9 What is Explain
the empirical concept
risk? of ais
What Kernel function
“empirical risk in Support
minimization”? Vector Machines.
10 Explain the concept of a Kernel function in Support Vector Machines. Why are
10 Explain Why
the concept of aso
are kernels Kernel
useful?function in Supporta Vector
What properties kernel Machines. Why
should posses to are
be
kernels so useful? What properties a kernel should posses to be used in an SVM?
kernels so
useduseful?
in an What
SVM?properties a kernel should posses to be used in an SVM?
PART B
PART
Answer any one Question from each B module. Each question carries 14 Marks
11 a)Answer i. any one
Find allQuestion
solutionsfrom PART
to theeach B linear
module.
system of Each question carries 14 Marks
equations (6)
11 a) i. Answer
Find all any
solutions to the system of linear equations
one Question from each module. Each question carries 14 Marks (6)
"
OR
12 a) i. Let L be the line through the OR 2
ORorigin in R that is parallel to the vector (6)
12 a) i. Let L be 4]Tline
[3,the . Find the standard
through matrix
the origin in Rof
2 the orthogonal projection onto L. Also
that is parallel to the vector (6)
T
[3, 4] find
. Find
thethe standard
point matrixisofclosest
on L which the orthogonal projection
to the point onto
(7 , 1) and findL.the
Also
point on
find theL point
whichonis Lclosest
whichtoisthe
closest
pointto(-3
the, 5).
point (7 , 1) and find the point on
ii. Find
L which the rank-1
is closest approximation
to the point (-3 , 5).of
ii. Find the rank-1 approximation of
"
OR
OR
14 a) Let g be the function given by (8)
"
i. Calculate the partial derivatives of g at (0 , 0).
i. Calculate the partial derivatives of g at (0 , 0).
ii.
ii. Show
Showthatthatggisisnot
notdifferentiable
differentiableatat(0
(0,,0).
0).
b) Find the second order Taylor series expansion for f(x,y) = e-(x2+y2) cos(xy) about (0 , (6)
b) Find the second order Taylor series expansion for f(x,y) = e-(x2+y2) cos(xy) (6)
0).
aboutare
15 a) There (0 ,two
0). bags. The first bag contains four mangos and two apples; the second (6)
15 a) There
bag are twofour
contains bags. The first
mangos andbag
fourcontains
apples. four mangos
We also haveand two apples;
a biased (6)
coin, which
the second
shows bagwith
“heads” contains four mangos
probability 0.6 andand fourwith
“tails” apples. We also0.4.
probability haveIf athe coin
biased coin, which shows “heads” with probability 0.6 and “tails” with
probability 0.4. If the coin shows “heads”. we pick a fruit at
showsrandom
“heads”.from bag 1;
we pick otherwise
a fruit at we pick a fruit at random from bag 2. Your
random fromflips
friend bag the
1; otherwise
coin (youwe pick see
cannot a fruit
theatresult),
random froma bag
picks fruit2.atYour friend
random
flips the coin (you cannot see the result), picks a fruit at random from the
from the corresponding bag, and presents you a mango.
corresponding bag, and presents you a mango.
What What
is the is the probability
probability that
that the the mango
mango was picked
was picked from2?bag 2?
from bag
b) b)
Suppose that one
Suppose thathasone
written
has awritten
computer program that
a computer sometimes
program compiles and (8)(8)
that sometimes
sometimes notand
compiles (code does not
sometimes change).
not (code doesYou decide toYou
not change). model
decidethe apparent
to model
stochasticity (success vs. no success) x of the compiler using a Bernoulli
the apparent stochasticity (success vs. no success) x of the compiler using
distribution with parameter μ:
a Bernoulli distribution with parameter μ:
"
Choose a conjugate prior for the Bernoulli likelihood and compute the posterior
Choose a conjugate prior for the Bernoulli likelihood and compute the
distribution p( μ | x1 , ... , xN).
posterior distribution p( μ | x1 , ... OR
, xN).
OR
16 a) Consider a mixture of two Gaussian distributions (8)
i. i.Compute
Compute the the marginal
marginal distributions
distributions for for
eacheach dimension.
dimension.
ii. ii.
Compute
Compute the mean, mode and
the mean, modemedian
andformedian
each marginal
for eachdistribution.
marginal
iii. Compute the mean and mode for the two-dimensional distribution.
distribution.
i. Compute
b) Express the marginal distributions for each dimension.
iii.the Binomial
Compute the distribution
mean and mode as an
for exponential family distribution.
the two-dimensional distribution. Also (6)
ii. Compute the mean, mode and median for each marginal distribution.
express the Betathedistribution
iii. Compute mean and mode is an for
exponential family distribution.
the two-dimensional Show that the
distribution.
product of the Beta and the Binomial distribution is also a member of the
b) b)Express
Express the Binomial
the Binomial distribution
distribution as anas exponential
an exponential family
family distribution.
distribution. Also (6)(6)
exponential family.
FindAlso
17 a) express the express
theextrema thef(x,y,z)
of Beta distribution
Beta distribution =isxan is an exponential
- yexponential
+ z subject family =family
x2 + y2 distribution.
distribution.
to g(x,y,z) +Show that the
z2 = 2. (8)
b) product
Let Show of that
the the
Betaproduct
and theof the Beta and
Binomial the Binomial
distribution distribution
is also a member is also
of athe (6)
exponential
memberfamily.
of the exponential family.
17 a) Find the extrema of f(x,y,z) = x - y + z subject to g(x,y,z) = x2 + y2 + z2 = 2. (8)
17b) a)Let Find the extrema of f(x,y,z) = x - y + z subject to g(x,y,z) = x2 + y2 + z2 = (8)(6)
2.
b) Let
Show
" that x* = (1 , 1/2 , -1) is optimal for the optimization problem
Show that x* = (1 , 1/2 , -1) is optimal for the optimization problem
(6)
OR
18 a) Derive the gradient descent trainingOR
rule assuming that the target function (8)
18 a) Derive the gradient descent training rule assuming that the target function is (8)
is represented as od = w0 + w1x1 + ... + wnxn. Define explicitly the cost/
represented as od = w0 + w1x1 + ... + wnxn. Define explicitly the cost/error function
error function E, assuming that a set of training examples D is provided,
E, assuming that a set of training examples D is provided, where each training
where each training example d ∈ D is associated with the target output td.
example d ∈ D is associated with the target output td.
b) Find the maximum value of f(x,y,z) = xyz given that g(x,y,z) = x + y + z = 3 and (6)
x,y,z >= 0.
19 a) Consider the following probability distribution (7)
18 a) Derive the gradient
represented as od = wdescent
0 + w1x1 training
+ ... + wnrule assuming
xn. Define that the
explicitly target function
the cost/error is (8)
function
represented od =a wset
E, assumingasthat 0 +of
w1training
x1 + ... +examples
wnxn. Define
D isexplicitly
provided,thewhere
cost/error
each function
training
E, assuming
example d ∈ that
D is aassociated
set of training examples
with the D is tprovided,
target output d. where each training
COMPUTER
example d ∈ D is associated with the target output td. SCIENCE AND ENGINEERING
b) Find the maximum value of f(x,y,z) = xyz given that g(x,y,z) = x + y + z = 3 and (6)
b) Find the maximum value of f(x,y,z) = xyz given that g(x,y,z) = x + y + z = (6)
b) Find
x,y,z the maximum value of f(x,y,z) = xyz given that g(x,y,z) = x + y + z = 3 and (6)
>= 0.
19 a) x,y,z 3
Considerand x,y,z >= 0.
>= 0.the following probability distribution (7)
19 19
a) a)Consider the following
Consider probability
the following distribution
probability distribution (7) (7)
where θ is a parameter and x is a positive real number. Suppose you get m i.i.d.
where
where
samples is a parameter
θ isxi aθdrawn
parameter and
andthis
from xaispositive
a positive
x isdistribution. real
real number.
number.
Compute Suppose
Suppose
the you
maximumyou getmmi.i.d.
get
likelihood
samples
i.i.d.xfor
estimator drawn
isamples xfrom
θ based thisfrom
drawn
i on these distribution. ComputeCompute
this distribution.
samples. the maximum likelihood
the maximum
estimator for θ based
likelihood on these
estimator for θsamples.
based on these samples.
b) b)Consider the following
Consider Bayesian
the following network
Bayesian with with
network boolean variables.
boolean variables. (7) (7)
b) Consider the following Bayesian network with boolean variables. (7)
OR
Justify your answer.
OR
20 a) Consider the following one dimensional COMPUTER
training data set,SCIENCE
’x’ denotes AND
negative (6)
ENGINEERING
examples and ’o’ positive examples. The exact data points and their labels are
20 a) Consider the following one dimensional training data set, ’x’ denotes (6)
given in the table below. Suppose a SVM is used to classify this data.
negative examples and ’o’ positive examples. The exact data points and
their labels are given in the table below. Suppose a SVM is used to
classify this data.
"
i. Indicate which arewhich
i. Indicate the support vectors
are the andvectors
support mark theand
decision
mark boundary.
the decision
ii. Give the value of the cost function and the model parameter after training.
b) Supposeboundary.
that we are fitting a Gaussian mixture model for data items (8)
consisting of athe
ii. Give single
valuereal value,
of the costx,function
using K and
= 2 components. We haveafter
the model parameter N=
5 training cases, in which the values of x are as 5, 15, 25, 30, 40. Using the
training.
EM algorithm to find the maximum likeihood estimates for the model
parameters, what are the mixing proportions for the two components, π1
and π2, and the means for the two components, μ1 and μ2. The standard
deviations for the two components are fixed at 10.
COMPUTER SCIENCE AND ENGINEERING
b) Suppose that we are fitting a Gaussian mixture model for data (8)
items consisting of a single real value, x, using K = 2 components.
We have N = 5 training cases, in which the values of x are as 5,
15, 25, 30, 40. Using the EM algorithm to find the maximum
likeihood estimates for the model parameters, what are the mixing
proportions for the two components, π1 and π2, and the means for
the two components, µ1 and µ2. The standard deviations for the
two components are fixed at 10.
Suppose that at some point in the EM algorithm, the E step found
Suppose that at some point in the EM algorithm, the E step found that the
that the responsibilities of the two components for the five data
responsibilities of the two components for the five data items were as
items were as follows:
follows:
"
What values for the parameters π1, π2 , µ1, and µ2 will be found in
What
the nextvalues forofthe
M step algorithm?π1, π2 , μ1, and μ2 will be found in the next
theparameters
M step of the algorithm? ****
****
COMPUTER SCIENCE AND ENGINEERING
Teaching Plan
No. of
Lectures
No Topic
(45)
7. Quadratic Programming 1
5 Module-V (CENTRAL MACHINE LEARNING PROBLEMS) 14
14. Kernels 1
*Assignments may include applications of the above theory. With respect to module V,
programming assignments may be given.