Professional Documents
Culture Documents
Feature Extraction
Feature Extraction
K. Ramachandra Murthy
Email: k.ramachandra@isical.ac.in
Mathematical Tools (Recall)
Linear Independence
0 Different direction vectors are independent.
0 Parallel vectors are dependent. (+ve/-ve directions are parallel)
Basis & Orthonormal Bases
0 Basis (or axes): frame of reference
vs
Basis
=; = is a set
of representative for
=; = is another
[ ]
−9
6
=− 9.
[] [] [
1
0
+6 .
0
1
=
1
0 ][ ]
0 −9
1 6
;
Linear
Transformation
;
[ ]
−9
6
=−3.
[] [ ] [
2
1
+3 .
−1
3
=
2
1 3][ ]
−1 − 3
3
;
Basis & Orthonormal Bases
Basis: a space is totally defined by a set of vectors – any
vector is a linear combination of the basis
.
.
.
Projection: Using Inner Products
Projection of along the direction
0 ⃗𝑎 . ⃗𝑒
y cos 90 =¿ ¿
Light Source
‖𝑎⃗‖‖⃗𝑒‖
⃗𝑥
⃗𝑒 = ⃗𝑥 − ⃗
𝑝
⃗
𝑎
⃗𝑝 =𝑡 ⃗𝑎
x =
Projection Matrix
Eigenvalues & Eigenvectors
0 Eigenvectors (for a square mm matrix S)
𝑆 𝑣⃗ =𝜆 ⃗𝑣
Example
(right) eigenvector eigenvalue
𝑚 ⃗ 𝜆𝜖 ℝ
⃗𝑣 𝜖 ℝ ≠ 0
𝑆 𝑣⃗
⃗𝑣 𝜆 ⃗𝑣
Eigenvalues & Eigenvectors
(Properties)
0 The eigenvalues of real symmetric matrices are real.
𝐴 𝑣⃗ =𝜆 ⃗𝑣
Then we can conjugate to get =
If the entries of A are real, this becomes
This proves that complex eigenvalues of real valued matrices come in conjugate pairs
Now transpose to get . Because A is symmetric matrix we now have
Multiply both sides of this equation on the right with , i.e.
On the other hand multiply on the left by to get
Eigenvalues & Eigenvectors
(Properties)
0 If A is an n x n symmetric matrix, then any two eigenvectors that
come from distinct eigenvalues are orthogonal.
Similarly
AV = VΛ
Where
0 If all eigenvectors are linearly independent , V is invertible.
Therefore
Variance
Mean
Variance
2
-
Variance
2
-
2
-
Variance
1
---------------- ……… + 2 2
- + - + ………
No. of Data
Points
Variance Formula
𝑛
1
𝜎 = ∑ ( 𝑥𝑖 − 𝑥)
2 2
𝑛 𝑖=1
Standard Deviation
√
𝑛
1
𝜎 = ∑ (𝑥𝑖 − 𝑥)2
𝑛 𝑖=1
[ standard deviation = square root of the
variance ]
Variance (2D)
Variance (2D)
Variance (2D)
Variance (2D)
Variance (2D)
Variance(x)
Covariance
Covariance
Covariance
𝑦
<0
𝑦1
𝑥1 𝑥
<0
Covariance
𝑦1
0
𝑦
𝑥 𝑥1 0
Covariance
>0
<0
Positive
Relation
<0
>0
Covariance
Covariance
Covariance
𝑦1
0
𝑥1 𝑥 <0
Covariance
𝑦
<0 𝑦1
𝑥 𝑥1
>0
Covariance
>0
Negative
Relation
>0
<0
Covariance
Covariance
0 0
No
Relation
0
0
Covariance Matrix
[ ]
𝑐 𝑜𝑣 (𝑥 1 , 𝑥 1) 𝑐 𝑜𝑣 (𝑥 1 , 𝑥 2) ⋯ 𝑐 𝑜𝑣 ( 𝑥 1 , 𝑥𝑚 )
𝑐 𝑜𝑣 (𝑥 2 , 𝑥 1) 𝑐 𝑜𝑣 (𝑥 2 , 𝑥 2) ⋯ 𝑐 𝑜𝑣 ( 𝑥 2 , 𝑥𝑚 )
𝐶𝑜𝑣 ( ∑ )=
⋮ ⋮ ⋮ ⋮
𝑐 𝑜𝑣 ( 𝑥 𝑚 , 𝑥 1) 𝑐 𝑜𝑣 ( 𝑥 𝑚 , 𝑥 2) ⋯ 𝑐 𝑜𝑣 ( 𝑥𝑚 , 𝑥𝑚 )
[]
𝑥1
1 𝑇 𝑥
𝐶𝑜𝑣 ( ∑ )= ( 𝑋 − 𝑋 ) ( 𝑋 − 𝑋 ) ; 𝑤h𝑒𝑟𝑒 𝑋 = 2
𝑛 ⋮
𝑥𝑚
Covariance Matrix
[ ]
𝑐 𝑜𝑣 (𝑥 1 , 𝑥 1) 𝑐 𝑜𝑣 (𝑥 1 , 𝑥 2) ⋯ 𝑐 𝑜𝑣 ( 𝑥 1 , 𝑥𝑚 )
𝑐 𝑜𝑣 (𝑥 2 , 𝑥 1) 𝑐 𝑜𝑣 (𝑥 2 , 𝑥 2) ⋯ 𝑐 𝑜𝑣 ( 𝑥 2 , 𝑥𝑚 )
𝐶𝑜𝑣 ( ∑ )=
⋮ ⋮ ⋮ ⋮
𝑐 𝑜𝑣 ( 𝑥 𝑚 , 𝑥 1) 𝑐 𝑜𝑣 ( 𝑥 𝑚 , 𝑥 2) ⋯ 𝑐 𝑜𝑣 ( 𝑥𝑚 , 𝑥𝑚 )
−1 ≤ 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 ( 𝑥 , 𝑦 ) ≤+1
Feature Extraction
What is feature extraction?
0 Feature extraction refers to the mapping of the original high-
dimensional data onto a lower-dimensional space.
0 Criterion for feature extraction can be different based on different problem
settings.
0 Unsupervised setting: minimize the information loss
P pm : x m y Px p (p m)
What is feature extraction?
Original data reduced data
Linear transformation
Y p
P p m
X m
p m p
P : X Y PX
High-dimensional data in
computer vision
dimensional vector
space spanned by an
orthonormal basis.
All data/measurement
vectors in this space are
linear combination of
the set of standarad unit
length basis vectors.
n- students
What features can we ignore?
0 Constant feature (number of heads)
⎻ 1,1,…,1.
⎻ High variance
Questions
– Change of basis
Change of Basis
0 Let X and Y be mxn matrices related by a linear transformation
P.
0 X is the original recorded data set and Y is a re-representation
of that data set.
PX = Y
0 Let’s define;
‒ pi are the rows of P.
[]
𝑝1
0 Lets write out the explicit dot products of PX. 𝑝2
𝑃𝑋 = [ 𝑥 1 , 𝑥 2 ,… , 𝑥 𝑛 ]
⋮
0 We can note the form of each column of Y. 𝑝𝑚
hand.
Noise
0 Noise in any data must be low or – no matter the analysis technique – no
information about a system can be extracted.
0 There exists no absolute scale for the noise but rather it is measured
relative to the measurement, e.g. recorded ball positions.
0 A common measure is the signal-to-noise ration (SNR), or a ratio of
variances.
SNR
0 A high SNR (>>1) indicates high precision data, while a low SNR
indicates noise contaminated data.
Why Variance ?!!!
dimensional space along which the variance in X is maximized - it saves this as p1.
0 The variances associated with each direction pi quantify how principal each
direction is. We could thus rank-order each basis vector pi according to the
corresponding variances.
Solving PCA: Eigen Vectors of
Covariance Matrix
0 We will derive an algebraic solution to PCA using linear
algebra. This solution is based on an important property of
eigenvector decomposition.
0 Once again, the data set is X, an m×n matrix, where n is the
number of data patterns and m is the number of feature types.
0 The goal is summarized as follows:
Find some orthonormal matrix P where Y=PX such that is diagonalized.
(Assuming zero mean data i.e. subtract the mean)
Solving PCA: Eigen Vectors of
Covariance Matrix
We begin by rewriting SY in terms of our variable of choice P.
p2
p1
What We Have Done So Far….
p2
p1
𝜆2
p1
𝜆1
PCA Process – STEP 1
PCA Process – STEP 1
PCA Process – STEP 2
PCA Process – STEP 3
PCA Process – STEP 3
PCA Process – STEP 4
PCA Process – STEP 4
PCA Process – STEP 4
PCA Process – STEP 4
PCA Process – STEP 5
PCA Process – STEP 5
PCA Process – STEP 5
PCA Process – STEP 5
Reconstruction of Original Data
Reconstruction of Original Data
Reconstruction of Original Data