Assignment 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

CSE 474/574 Introduction to Machine Learning

Fall 2011 Assignment 1


Due date: September 15, 2011
Total points: 70
The assignment is due at the beginning of the class on the above mentioned due
date. Submit a typed hard copy in class. Attach print out of code, figures/results
of any coding questions in your hard copy along with answers to other questions.
1. Probability Theory
In Blue County, 54% of the adults are males. One adult is randomly selected
for a survey involving credit card usage.
(a) What is the prior probability that the selected person is a female? (3)
(b) It is later learned that the selected survey subject was smoking a cigar.
It is also known that 10% of males smoke cigars, whereas 1% of females
smoke cigars. Now, what is the probability that the selected subject is a
female? (7)
Please define the events, prior probability and conditional probability before
answering the above question.
2. Curve Fitting
(a) Generate 11 points equally spaced in [0, 1]. Let these points be denoted
by X = {xi |i [0, 10]}. Generate a set of points Y = {yi |yi = cos(2xi ) +
vi , i [0, 10]}, where vi is sampled from a Gaussian distribution with mean
zero and variance 0.1 (i.e. = 0 and 2 = 0.1). Using Matlab to plot
the smooth curve of cos(2xi ) as well as the generated points in the same
figure. (5)
Hint: use hold on before ploting, and hold off while both are plotted.
(b) Now fit the points using a polynomial y of M + 1 degree of freedom, i.e.
y = w0 x0 + w1 x1 + w2 x2 + . . . + wM xM . Let the error function have the
form E(w) = 12 {yi yi }2 , where yi represent the target value of the training example xi , derive a closed form solution for the parameters vector W,
which is defined as (w0 , w1 , . . . , wM )T . (10)
Hint: you may find it useful to arrange all vectors in a matrix form
T
T
0
1
2
M T
where X = (xT
XW = Y,
1 , x2 , . . . , xN ), xi = (xi , xi , xi , . . . , xi ) , and
T
= (y1 , y2 , . . . , yN ) .
Y
1

Y)T (Y
Y). Some useful
Then the error function will be E(w) = 12 (Y
identities for matrix operation:
Tr[ABC . . .] = Tr[BC . . . A] = Tr[C . . . AB] = . . .
T r[XA]
r[AX]
= T X
= AT
X
T r[AX T ]
T r[X T A]
=
=A
X
X
T r[X T AX]
T
= (A + A )X
X

T r[] is the trace operation on matrix.


(c) Implement the curve fitting in Matlab by using the solution above. Use
the 11 training examples generated from the first question, and vary the
values of M from 1 to 10. Plot all the fitted curves corresponds to each M
value using Matlab in separate figures. Please make sure that the traning
points are also included in all the plots so that it is possible to visually
inspect the goodness of fit. (10)
yi yi }2 +
(d) Modify the objective function to a regularized one, E(w) = 12 {

W T W , and derive the new solution for parameters W . (10)


2
(e) Modify your code in previous question to include this regularizer and reestimate the parameters of all the 11 points and using M = 10. Try to use
different values of (you need to try at least five different values include
= 0 to see the results) and plot the corresponding curve as before. Please
also comment on your results. (5)
3. Decision Theory
(a) Consider two nonnegative numbers a and b, show that if a b then a
(ab)1/2 . (3)
(b) Use the above result to show that, if the decision regions of a two-class
classification are chosen to minimize the probability of misclassification,
this probability will satisfy
Z
p(mistake) {p(x, C1 )p(x, C2 )}1/2 dx
(7)
4. Information Theory
The joint distribution of two binary variables x and y is given in the table,

calculate the following quantities


(a) H[x] (2)
2

(b) H[y] (2)


(c) H[x, y] (2)
(d) H[y|x] (2)
(e) H[x|y] (2)
Note: result without steps of calculation will not be accepted.

You might also like