Professional Documents
Culture Documents
Hota ML13
Hota ML13
30.04.2024
(RBF)
Hinge loss for a sample = max (0, 1 - y.f(x))
true class
f(x)=w.x+b
• If 𝑦⋅𝑓(𝑥) ≥ 1, the loss is zero. This indicates that the sample lies outside the margin and is correctly classified.
• When 𝑦⋅𝑓(𝑥) < 1, the loss becomes positive and proportional to the distance from the margin.
Supervised Vs. Un-supervised
• Supervised: Learning from labelled data
• Train data: (X, Y) for Input X, Y is the label Classification/ Regression.
• (Sunny, Evening, Moderate_Temp: Play)
Spencer(1)
Dmart(1)
More(3)
r
s
?
Solution:
Replace ‘hard’
clustering of K-
means with
‘soft’
probabilistic
assignments
(Gaussian
Mixture Model)
Bell-shaped
p(x)
Normal/ Gaussian
Mixing coefficient:
Relative importance of
each component ‘k’ in the
mixture.
Mixture of Gaussians x
K=4
By increasing the number of components the curve defined by the mixture model can
take basically any shape, so it is much more flexible than just one Gaussian.
Contour Plots of Mixture Models
(Contours for each component) (Contours of mixture distribution)
E-step (Expectation):
For each datum xi:
Compute ‘ric’, the probability that it belongs to cluster ‘c’:
1. Compute its probability under model ‘c’
2. Normalize to sum to one (over clusters ‘c’)
M-step (Maximization):
For each cluster (Gaussian) xc
Update its parameters using the (weighted) data points
(total responsibility allocated to cluster c)
(Weighted covariance)
4.3
Hemoglobin Concentration
4.2
Hemoglobin
Cell Hemoglobin
4.1
4
Cell
Cell
Blood
3.9
Blood
Blood
RedRed
3.8
Red
3.7
3.3 3.4 3.5 3.6
3.6 3.7
3.7 3.8
3.8 3.9
3.9 44
RedBlood
Red CellVolume
Blood Cell
Cell Volume
Volume
• Which one might contribute less to both Principal components and hence irrelevant?
Why Dimensionality Reduction?
• Curse of Dimensionality:
Increasing the number of
features could lead to sparse
data (if number of data points
remain same) leading to
dissimilar data points. This might
result in Overfitting as model
learns Noise in the data. This
reduces generalizability.
• Computational efficiency: With fewer dimensions, algorithms can run faster and
require less memory.
• Visualization: It's challenging to visualize data in more than three dimensions.
Dimensionality reduction techniques can help project data into lower-dimensional
spaces that can be visualized more easily.
(Scatter plot: Data points distributed across the graph. Can you segregate them easily?)
Preserving the Variance: PCA Continued…
Which one is 1st PC and which one is 2nd PC? (Projection of dataset into there axes)
Image source: Aurelien Geron’s text
Maths behind PCA
12
Height Weight 10
2 2 8
3 4 6
W
4
6 6
2
6 7
0 Body Mass Index (BMI)
8 11 0 2 4 6 8 10
H
• Scatter plot showing the trend line indicating there is a correlation between H and W.
6
5
•Height
Next step: Weight
Normalize or standardize the data (as
4
H is in meters/ ft/inches W in kgs/lbs.
Centered
This is also called “Centering the data”. 3 data/
2-5=-3 2-6=-4 2 Standardize
Centered W
3-5=-2 4-6=-2 1
0
d data tells
6-5=1 6-6=0 -4 -3 -2 -1 -1 0 1 2 3 4 us how far
-2 any original
6-5=1 7-6=1 -3
value is from
-4
8-5=3 11-6=5 -5 the mean.
Centered H
Continued…
• Next, we compute the Covariance matrix based on the centered/ standardized data.
gives variance of each variable.
H W 2
i =(-32+(-2)2+12+12+32)/4=24/4=6
H 6
= 0.29
8 = 17.21
W 8 11.5 Var (W) = ((-4)2 + (-2)2 + (0)2 + (1)2 + (5)2)/4 = 46/4 = 11.5
• Now compute how much the two variables spread from each other i.e Cov(H, W)?
Remember mean is already 0 for centered data.
Cov (H, W) = (-3 X -4 + -2 X -2 + 1 X 0 + 1 X 1 + 3 X 5) / (5 - 1) = 32/4 = 8
Eigen values
• Next Compute the Eigen values: det | A – λI | = 0
(6 – λ) X (11.5 – λ) – 8 X 8 = 0
6 8 λ 0 λ1 = 17.21
– =0
8 11.5 0 λ 2
(λ ) – 17.5 λ + 5 = 0
λ2 = 0.29
Continued…
• Next, find out the Eigen vectors to these two values. 1.40
1
6 8 x x
• A.v = λ.v . = 17.21
8 11.5 y y
Eigen
6X + 8Y = 17.21 X 8Y = 11.21 X
1 vector of
Y = 1.40 X
8X + 11.5Y = 17.21 Y 8X = 5.71 Y 1.40 Covariance
matrix
v1
• Now, normalize to unit length:
1/1.72 .5814
Length of vector = Sqrt (12 + 1.402) = 1.72 v1 =
1.40/1.72 = .8139
Similarly get the Eigen vector of the Covariance matrix for Eigen value 2:
-3 -4 -4.9998 -.1173
-2 -2 -2.7906 -.4656
. .5814 .8139
1 0 = .5814 .8139
.8139 -.5811
1 1 1.3953 .2328
3 5 5.8137 -.4638
D V
Stores Does not
information store much.
about all
variables
• Why PC2 does not store much info?
17.21/17.21+.29
• How much % of total variance is contributed by PC1? = 98.34%
Thank You!