Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

EVALUATION SCHEME

SPRING MID-SEMESTER EXAMINATION-2024


School of Computer Engineering
Kalinga Institute of Industrial Technology, Deemed to be University
Machine Learning
CS 3035

1.
a.

[1 Marks]

b.

[1 Marks]

c. Precision will be used to evaluate the models. We would like to minimize the false positives to
minimize the cost. [1 Marks]

d. Lasso regression is useful in case of feature selection because lasso sets the value of regression
coefficients exactly equal to zero. Features with zero coefficients can simply be ignored. [1
Marks]

e. Advantages [0.5 Marks]


Converges very quickly
Takes less memory
Disadvantages [0.5 Marks]
SGD convergence is much noisier compared to batch or mini-batch because each iteration uses
an approximate gradient and not the actual gradient over all the training samples.

2.
a. [3 Marks (1 + 0.5 + 0.5 + 1)]
The learning rate is a tuning parameter that determines the steps gradient descent takes into the
direction of the local minimum. It defines how fast or slow we will move towards the optimal
weights. [1 Marks]

i. A small learning rate may lead to the model taking a longer time to learn. [0.5 Marks]
ii. A large learning rate will make the model converge as our pointer will shoot and we may
not be able to get to a minima. [0.5 Marks]

[1 Marks]
Img Source: https://cs231n.github.io/neural-networks-3/

b. [2 Marks (1.5 + 0.5)]


1.5 Marks - Calculating Euclidean distance correctly. Also, give full if the students have
calculated just the square of Euclidean distances.
0.5 Marks - For correct prediction, zero for guessing.

To predict the species of a new animal, we have to calculate the distance of the features of the
new animal from the features of other animals in the data set using the Euclidean distance
formula.

Here's the formula: √(X₂-X₁)²+(Y₂-Y₁)²


Where:
X₂ = New animal's weight (4).
X₁= Existing animal's weight.
Y₂ = New animal's height (30).
Y₁ = Existing animal's height.

Distance #1
For the first row, d1:
d1 = √(4 - 4)² + (30 - 35)²
= √0 + 25
= √25
=5

Distance #2
For the second row, d2:
d2 = √(4 - 6)² + (30 - 40)²
= √4 + 100
= √104
= 10.2

Distance #3
For the third row, d3:
d3 = √(4 - 3)² + (30 - 25)²
= √1 + 25
= √26
= 5.1
Distance #4
For the fourth row, d4:
d4 = √(4 - 7)² + (30 - 45)²
= √9 + 225
= √234
= 15.3

Distance #5
For the first row, d5:
d1 = √(4 - 5)² + (30 - 30)²
= √1 + 0
= √1
=1

Distance #6
For the second row, d2:
d2 = √(4 - 8)² + (30 - 50)²
= √16 + 400
= √416
= 20.4

Distance #7
For the third row, d3:
d3 = √(4 - 2)² + (30 - 20)²
= √4 + 100
= √104
= 10.2

Distance #8
For the fourth row, d4:
d4 = √(4 - 5)² + (30 - 35)²
= √1 + 25
= √26
= 5.1

Here's what the table will look like after all the distances have been calculated:

As we can see, the majority class within the 3 nearest neighbors to the new animal is cat.
Therefore, we'll classify the new animal as cat.

3.
a. [3 Marks (1 + 1 + 1)]
1 Mark - For calculating P(M/X) correctly
1 Mark - For calculating P(H/X) correctly
1 Mark - Predicting the species correctly

P(Species=M)=4/8=0.5
P(Species=H)=4/8=0.5

P(Color=Green/Species=M)=2/4=0.5
P(Color=Green/Species=H)=1/4=0.25

P(Legs=2/Species=M)=1/4=0.25
P(Legs=2/Species=H)=4/4=1

P(Height=Tall/Species=M)=1/4=0.25
P(Height=Tall/Species=H)=2/4=0.5

P(Smelly=No/Species=M)=1/4=0.25
P(Smelly=No/Species=H)=3/4=0.75

Then, the probability of X belonging to Species M will be as follows.


P(M/X) =
P(Species=M)*P(Color=Green/Species=M)*P(Legs=2/Species=M)*P(Height=Tall/Species=M)*
P(Smelly=No/Species=M)
= 0.5 * 0.5 * 0.25 * 0.25 * 0.25
= 0.00390625

Similarly, the probability of X belonging to Species H will be calculated as follows.

P(H/X) =
P(Species=H)*P(Color=Green/Species=H)*P(Legs=2/Species=H)*P(Height=Tall/Species=H)*P
(Smelly=No/Species=H)
= 0.5 * 0.25 * 1 * 0.5 * 0.75
= 0.046875

Since, P(H/X) > P(M/X). Hence, we will assign the entity X with attributes {Color=Green,
Legs=2, Height=Tall, Smelly=No} to species H.

b. The hypothesis function in logistic regression is given as


𝑇 1
ℎ(𝑥) = 𝑔(θ 𝑥) = 𝑇
−θ 𝑥
1+𝑒
Here g(x) is the sigmoid or logistic function. The sigmoid function domain is from -∞ to ∞. It
gives a value in the range (0,1).

Suppose we have two classes, 0 and 1. If the value of the h(x) >= 0.5, we classify the input x in
class 1 otherwise we classify the input to be in class 0.

ℎ(𝑥) >= 0. 5
1
𝑇
−θ 𝑥
>= 0. 5
1+𝑒
𝑇
−θ 𝑥
2 >= 1 + 𝑒
𝑇
−θ 𝑥
1 >= 𝑒

Taking log on both sides,


𝑇
0 >= −θ 𝑥
𝑇
θ 𝑥 > 0 …. (1)
𝑇
Equation (1) implies that all the points that belong to class 1 will be on one side of the line θ 𝑥.

Similarly, we can prove that all the points that belong to class 0 will be in the region defined by

𝑇
θ 𝑥 <= 0 …. (2)

Based on equations (1) and (2), logistic regression is a linear classifier.

4.
a. [3 Marks (1 + 1 + 1)]
1 Mark - Normalization (min-max)
1 Mark - Standardization (Z-score normalization)
1 Mark - Importance w/ example

Normalization
Normalization is a scaling technique a mapping technique or a pre-processing stage. Where we
can find a new range from an existing one range. Some common normalization techniques are as
follows

Min-max Normalization
The general formula for min-max normalization is given as

where 𝑥 is the original feature vector value and 𝑥' is the normalized feature vector. After
min-max normalization, your values will be in the range [0,1]. One problem with min-max
normalization is that it is highly sensitive to outliers. For example, if you have 99 values in the
range [0, 100] and one value is 100. Then 99 values will be squished in the range [0, 0.01].

Standardization (Z-Score Normalization)


The general formula for min-max normalization is given as

where 𝑥 is the original feature vector, 𝑥' is the normalized feature vector, 𝑥‾ is the mean and σ is
the standard deviation. Unlike min-max normalization, the normalized values in the case of
standardization are not fixed in a fixed range. The mean of the normalized vector 𝑥' is zero. The
standard deviation of the normalized vector 𝑥' is 1. One advantage of standardization over
min-max normalization is that it is less sensitive to the presence of outliers.

Importance
Normalization is useful if we are using any distance-based algorithms like KNN. Suppose you
have the following dataset.

Age Salary

26 100000

23 200000

25 250000

23 220000

29 300000
As we can see from this dataset, salary scales much higher compared to age. Due to this, when
you calculate Euclidean distance in K-NN, your prediction will be entirely dependent on the
salary attribute due to the larger value. To avoid this, we can use normalization to make salary
and age comparable to each other.

b. [2 Marks]
1 Mark - Linear regression scale invariant
1 Mark - Ridge regression not scale invariant
[Mathematical proof is not required, just an intuitive argument is enough]

Linear Regression
The hypothesis function in linear regression is given as,
ℎθ(𝑥) = θ0 + θ1𝑥1 + θ2𝑥2
The cost function 𝐽(θ) is given as

If we divide 𝑥2 by a constant 𝑐 then the parameter θ2 will adapt to the change and its value will
become 𝑐 times. Due to this, the value of the cost function remains unchanged and the hypothesis
function will also be the same for all inputs.

Ridge Regression
The cost function in case of ridge regression is given as,

If we divide 𝑥2 by a constant 𝑐 then the parameter θ2 will not be scaled by 𝑐. This is because the
term

in the loss function penalizes the size of the parameter. It will prevent θ2 from becoming too
large. Due to this, the hypothesis function will give a different value after scaling the feature.

5.
a. [5 Marks]
4 Marks - 1 Mark for calculating information gain of each attribute.
1 Mark - For correctly mentioning which attribute to split on the root node.

Entropy(S) = - 0.6log(0.6) - 0.4log(0.4) = 0.970950594


Temperature
Entropy(S_{temp=high}) = - 0.8log(0.8) - 0.2log(0.2) = 0.721928094
Entropy(S_{temp=medium}) = - 0.4log(0.4) - 0.6log(0.6) = 0.970950594
IG(S,Temperature)
= 0.970950594 - 0.5 * 0.720950594 - 0.5 * 0.970950594
= 0.125

Wind Direction
Entropy(S_{wind_direction=west}) = - 0.75log(0.75) - 0.25log(0.25)
= 0.811278124
Entropy(S_{wind_direction=south})= - 0.5log(0.5) - 0.5log(0.5) = 1
IG(S,Temperature)
= 0.970950594 - 0.4 * 0.811278124 - 0.6 * 1
= 0.0464393444

Rainy
Entropy(S_{rainy=Y}) = - (1/3)log(1/3) - (2/3)log(2/3) = 0.918295834
Entropy(S_{rainy=N}) = - (5/7)log(5/7) - (2/7)log(2/7) = 0.863120569
IG(S, Rainy)
= 0.970950594 - 0.3 * 0.918295834 - 0.7 * 0.863120569
= 0.0912774455

Humidity
Entropy(S_{humidity=high}) = - (5/8)log(5/8) - (3/8)log(3/8) = 0.954434004
Entropy(S_{humidity=medium}) = 1
IG(S, Rainy)
= 0.970950594 - 0.8 * 0.954434004 - 0.2 * 1
= 0.0074033908
Information gain of the Temperature attribute is highest. So, we will split the root node based
on the temperature attribute.

You might also like