Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Machine learning

4th Report
Ioannis Kouroudis

Problem 1b
By using the bayes rule and substituting the known quantities we arrive to
the following set of equations:

p(y = 1|x) = σ(ln λλ10 + (λ0 − λ1 )x)

and equivalently

p(y = 0|x) = 1 − σ(ln λλ01 + (λ0 − λ1 )x)

It is known, that the value of the sigmoid falls bellow 0.5 when the k in σ(k)
equals 0. To this end, solving, we get:
λ
ln λ1 lnλ1 −lnλ0
ln λλ10 + (λ0 − λ1 )x = 0 → xthreshold = 0
λ0 −λ1 → xthreshold = λ0 −λ1 =
lnλ0 −lnλ1
λ1 −λ0
Consequently, the threshold for both labels is the same and can be used as a
decision boundary. This is the case, as, if two sigmoids have the same variable
expression with different signs, they are essentially symmetric, with s=0.5 as
the symmetry point. There are 2 distinct possibilities.

1) λ0 > λ1

In this case, the probability for y = 1 goes to 1 as x goes to infinity. Conse-


quently, all x > xthreshold will be classified as y = 1.
2) λ0 < λ1

In this case, the probability for y = 1 goes to 0 as x goes to infinity. Conse-


quently, all x < xthreshold will be classified as y = 1.

Problem 1a
As can be seen in the solution of problem 1b, the posterior distribution is a
sigmoid of a linear function of x, or equivalently a bernoulli distribution.

1
Problem 2
The maximum likelihood solution for the decision boundary w of a logistic
classification model does not have an open form a. This is solved by substitut-
ing analytical methods with numerical. Further, supposing a data point falls
directly on the hyperplance seperator, then the decision is undefined. This can
be solved by taking into account problem dependent knowledge (e.g. always
choose the most pessimistic outcome) Lastly, the model is prone to extreme
overfitting. This can be solved by for example adding a weight to penalize large
values of w. Further, this can be mitigated by splitting the training set into
training and validation many times, each with a different split. Then train the
model once for each split and average (with weights or without) the outcome.
Lastly, we can chose a non linear separation model in an attempt to provide a
more accurate classification and thus avoid the need to overfit the model to the
data.

Problem 3
Let’s start by defining a classifier for our two classes

y0 = w0 X

y1 = w1 X

We will prove the question for one class (say, class 0) but the procedure
equally proves it for the other.

ew0 X
sof tmax(y = 0) = ew0 X +ew1 X

Supposing we substracted an arbitrary, constant from the w coefficients

e(w0 −c)X ew0 X


sof tmax(y = 0) = e(w0 −c)X +e(w1 −c)X
→ sof tmax(y = 0) = ew0 X +ew1 X

Consequently, we can choose to substract w1 , without affecting the equality

e(w0 −w1 )X
sof tmax(y = 0) = e(w0 −w1 )X +1

If we redifine our classifier as w0,new = w0 − w1 we see that the softmax is


transformed into

ew0,new X
sof tmax = ew0,new X +1
= sigmoid(w0,new , X)

Consequently, both functions can return the same probability informations if


the classifiers are adjusted accordingly (i.e. w0,sigmoid = wsof tmax,0 −wsof tmax,1 .

2
Problem 4
I am almost certain that there is a far simpler solution, but this also works.
It is easy to determine the angle between the vector x and the vector [0, 1]T .
The angle in fact is given by:

θ = arccos( √x1 +x
2
2
2
)
x1 +x2

Noting that the class change occurs every 90o of the circle the basis function
that transforms the set to a linearly separated one will be

f (x1 , x2 ) = sin(θ)cos(θ)

the sign of which will be positive for the blue cross class and negative for
the red circle one.

You might also like