Nov 9 2020 Annotated

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 17

Announcements

• Project
• Kaggle.com
• Free math reference for machine learning
Discriminant Functions for the Gaussian Density
• Discriminant functions for minimum-error-rate classification
can be written as (ln(.) is monotonic) Monotonic
functions like

2 * f(x)

Think about
function that do
not change order
of values ordering
numbers
Numerical example case 1 (same sigma and
diagonal)
• Consider the two categories example

𝑁
W1 1
(1 6)T
𝜇= ∑ 𝑥 𝑖
𝑁 𝑖=1
(3 4)T
(5 6)T
𝑁
(3 8)T 1
Σ= ∑ (𝑥𝑖−𝜇)(𝑥𝑖 −𝜇)
𝑇
𝑁 𝑖=1
W2
(1 -2)T
(3 0)T
(5 -2)T
(3 -4)T
Numerical example case 1 (same sigma and
diagonal)
𝜇 1=
(36 ) 𝜇 2=(−32)

(
Σ 1= 2
0
0
2 ) (
Σ 2= 2
0
0
2 )
Case 1: Σi = σ2I
• Decision boundaries are the hyperplanes gi(x) = gj (x), and
can be written as

where

• Hyperplane separating R i and R j passes through the point


x0 and is orthogonal to the vector w.
Numerical example case 1
• Let’s repeat same problem but with formula
Proof of Case 1
Case 1: Σi = σ2I

Ellipses are
constant
probability
curves

If the covariance matrices of two distributions are equal and proportional


to the identity matrix, then the distributions are spherical in d dimensions,
and the boundary is a generalized hyperplane of d − 1 dimensions,
perpendicular to the line separating the means. The decision boundary
shifts as the priors are changed.
Case 1: Σi = σ2I

Recall
Case 2: Σi = Σ
Case 2: Σi = Σ
Derive for a bonus
Case 2: Σi = Σ

Probability densities with equal but asymmetric Gaussian distributions. The


decision hyperplanes are not necessarily perpendicular to the line
connecting the means.
Multivariate Gaussian

Samples drawn from a two-dimensional Gaussian lie in a cloud centered on the mean µ.

The loci of points of constant density are the ellipses for which (x − µ)T Σ−1(x − µ) is constant, where
the eigenvectors of Σ determine the direction and the corresponding eigenvalues determine the
length of the principal axes.

The quantity r2 = (x − µ)T Σ−1(x − µ) is called the squared Mahalanobis distance from x to µ.
https://en.wikipedia.org/wiki/Multivariate_normal_distribution
Recall the idea of minimum distance
classifier, now think in terms of Mahalanobis
distance

Case 2: Σi = Σ
Case 3: Σi = arbitrary
Case 3: Σi = arbitrary

Arbitrary Gaussian distributions lead to Bayes decision boundaries that are


general hyperquadrics.

You might also like