Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 39

An Introduction to Independent Component Analysis

(ICA)

吳育德
陽明大學放射醫學科學研究所
台北榮總整合性腦功能實驗室
The Principle of ICA: a cocktail-party problem

x1(t)=a11 s1(t) +a12 s2(t) +a13 s3(t)

x2(t)=a21 s1(t) +a22 s2(t) +a12 s3(t)


Independent Component Analysis
Reference : A. Hyvärinen, J. Karhunen, E. Oja (2001)
John Wiley & Sons. Independent Component Analysis
Central limit theorem
• The distribution of a sum of independent
random variables tends toward a Gaussian
distribution

Observed signal = m1 IC1 + m2 IC2 ….+ mn ICn

toward Gaussian Non-Gaussian Non-Gaussian Non-Gaussian


Central Limit Theorem

Partial sum of a sequence {zi} of independent and identically distributed r


andom variables zi k
xi   zi
i 1

Since mean and variance of xk can grow without bound as k, consider
instead of xk the standardized variables
xk  m x k
yk 
 xk

The distribution of yk  a Gaussian distribution with zero mean and unit varianc
e when k.
How to estimate ICA model
• Principle for estimating the model of ICA

Maximization of NonGaussianity
Measures for NonGaussianity
• Kurtosis
Kurtosis : E{(x- )4}-3*[E{(x-)2}] 2

Super-Gaussian kurtosis > 0


Gaussian kurtosis = 0
Sub-Gaussian kurtosis < 0

kurt(x1+x2)= kurt(x1) + kurt(x2)


kurt(x1) =4kurt(x1)
Whitening process

• Assume measurement x  As is zero mean and E{ssT }  I

• Let D and E be the eigenvalues and eigenvector matrix of


covariance matrix of x, i.e. E{xxT }  EDET
1

• Then V  D ET is a whitening matrix
2

1

  T
z Vx D E x 2


T T
E{zz } VE1{xx }V T
1
 
 D 2 ET EDET ED 2
I
Importance of whitening
For the whitened data z, find a vector w such that the linear c
ombination y=wTz has maximum nongaussianity under the con
strain E{ y 2 }  1

Then
1  E{ y 2 }
 E{w T z  z T w}
 w E{z  z }w
T T

w w T

Maximize | kurt(wTz)| under the simpler constraint that ||w||=1


Constrained Optimization
max F(w), ||w||2=1
2 2
L(w,  )  F (w )   ( w  1), w  wT w  1
L(w ,  )
0
w

F (w )
  [ 2w ]  0
w

F (w )
  2  w
w

At the stable point, the gradient of F(w) must point in the direct
ion of w, i.e. equal to w multiplied by a scalar.
Gradient of kurtosis

F (w )  kurt (w z )  E{(w z ) }  3[ E{(w z ) ] }


T T 4 T 2 2

F (w )  E{(w z ) }  3E{(w z ) }}
T 4 T 2


w w

 1 t 1 (w T z (t )) 4  3(w T w ) 2
T
1 T

T E{y}   y(t )
T t 1
w

4 T

T
t 1
z (t )[ w T
z (t )]3
 3 * 2( w T
w)(w  w )

2
 4sign(kurt(w z))[ E{z[w z] }  3w w ]
T T 3
Fixed-point algorithm using kurtosis
F ( w )
wk+1 = wk +  k
w
k
-1 F ( w )
=( + )  k
2 w k
2
Therefore, w  E{z[w T z ] }  3w w
3
w w/ w

Note that adding the gradient to wk does not change its direction, since

wk+1 = wk - ( 2  wk )

= (1- 2  ) wk

Convergence : |<wk+1 , wk>|=1 since wk and wk+1 are unit vectors


Fixed-point algorithm using kurtosis

1. Centering x~
x  m ~x

2. Whitening z  Vx, E{zz T }  I


3. Choose m, No. of ICs to estimate. Set counter p  1
One-by-one Estimation

4. Choose an initial guess of unit norm for wp, eg. randomly.


Fixed-point iteration

T 2
5. Let w p  E{z[ w p z ] }  3w p w p
3

6. Do deflation decorrelation p 1
w p  w p   (w Tp w j )w j
j 1

7. Let wp  wp/||wp||

8. If wp has not converged (|<wpk+1 , wpk>| 1), go to step 5.


9. Set p  p+1. If p  m, go back to step 4.
Fixed-point algorithm using negentropy
The kurtosis is very sensitive to outliers, which may be erroneous or
irrelevant observations
ex. r.v. with sample size=1000, mean=0, variance=1, contains one value = 10
 kurtosis at least equal to 104/1000-3=7

kurtosis : E{x4}-3 Need to find a more robust


measure for nongaussianity

Approximation of negentropy
Fixed-point algorithm using negentropy
Entropy
H (y )    p y ( ) log p y ( )d 0
Negentropy
J (y )  H (y gauss )  H (y ) 0

Approximation of negentropy
J ( y )  [ E{G ( y )}  E{G ( )}]2
1 g1 ( y )  tanh a1 y
G1 ( y )  log cosh a1 y, 1  a1  2
a1
G2 ( y )   exp( y 2 / 2) g 2 ( y )  y exp( y 2 / 2)
y4
G3 ( y )  g3 ( y)  y 3
4
Fixed-point algorithm using negentropy
Max J(y)
w  E{zg(wTz)}  E{g '(wTz)} w

w  w/||w||
Convergence : |<wk+1 , wk>|=1
1
G1 ( y )  log cosh a1 y, 1  a1  2 g1 ( y )  tanh a1 y g1 ( y )  a1[1  tanh 2 (a1 y )]
a1
G2 ( y )   exp( y 2 / 2) g 2 ( y )  y exp( y 2 / 2) g 2 ( y )  (1  y 2 ) exp( y 2 / 2)
y4
G3 ( y )  g3 ( y)  y 3 g 3 ( y )  3 y 2
4
Fixed-point algorithm using negentropy

1. Centering x~
x  m ~x

2. Whitening z  Vx, E{zz T }  I


3. Choose m, No. of ICs to estimate. Set counter p  1
One-by-one Estimation

4. Choose an initial guess of unit norm for wp, eg. randomly.


Fixed-point iteration

5. Let w p  E{z g ( w T
p z )}  E{ g ( w T
p z )}w p

6. Do deflation decorrelation p 1
w p  w p   (w Tp w j )w j
j 1

7. Let wp  wp/||wp||
8. If wp has not converged, go back to step 5.
9. Set p  p+1. If p  m, go back to step 4.
Implantations
• Create two uniform sources
Implantations
• Create two uniform sources
Implantations
• Two mixed observed signals
Implantations
• Two mixed observed signals
Implantations
• Centering
Implantations
• Centering
Implantations
• Whitening
Implantations
• Whitening
Implantations
• Fixed-point iteration using kurtosis
Implantations
• Fixed-point iteration using kurtosis
Implantations
• Fixed-point iteration using kurtosis
Implantations
• Fixed-point iteration using negentropy
Implantations
• Fixed-point iteration using negentropy
Implantations
• Fixed-point iteration using negentropy
Implantations
• Fixed-point iteration using negentropy
Implantations
• Fixed-point iteration using negentropy
Fixed-point algorithm using negentropy
Entropy
H (y )    p y ( ) log p y ( )d

A Gaussian variable has the largest entropy among all random


variables of equal variance
Gaussian  0
nonGaussian  >0

Negentropy
J (y )  H (y gauss )  H (y )

It’s the optimal estimator but computationally difficult since it r


equires an estimate of the pdf

Approximation of negentropy
Fixed-point algorithm using negentropy
High-order cumulant approximation
1 1
J ( y)  E{ y 3 }  kurt ( y ) 2
12 48
Replace the polynomial function by any n
onpolynomial fun Gi
ex.G1 is odd and G2 is even
J ( y )  k1 ( E{G1 ( y )}) 2  k 2 ( E{G2 ( y )}  E{G2 ( )}) 2

It’s quite common that most r.v. have approximately symmetric dist.  E{G1 ( y )}  0

J ( y )  [ E{G ( y )}  E{G ( )}]2


1 g1 ( y )  tanh a1 y
G1 ( y )  log cosh a1 y, 1  a1  2
a1
G2 ( y )   exp( y 2 / 2) g 2 ( y )  y exp( y 2 / 2)
y4
G3 ( y )  g3 ( y)  y 3
4
Fixed-point algorithm using negentropy
According to Lagrange multiplier  the gradient must point in the direction of w
J (w ) [ E{G (w T z )  E{G ( )}]2
w 
w w
 2 *[ E{G (w T z )}  E{G ( )}][E{zg (w T z )}]
ν is standardized Gaussian r.v.
 is constant
w   [ E{zg (w T z )}]

w  [ E{zg (w T z )}]
w w/ w
Fixed-point algorithm using negentropy
w  [ E{zg (w T z )}]
w w/ w

The iteration doesn't have the good convergence properties because the
nonpolynomial moments don't have the same nice algebraic properties.

(1   )w  [ E{zg (w T z )}]  w
w w/ w
Finding  by approximative Newton method

Real Newton method is fast.  small steps for convergence


It requires a matrix inversion at every step. large computational load

Special properties of the ICA problem  approximative Newton method


No need a matrix inversion but converge roughly with the same steps as real Newton method
Fixed-point algorithm using negentropy
According to Lagrange multiplier  the gradient must point in the direction of w
J (w )
 w  0
w
F (w )  E{zg (w T z )}  w  0
Solve this equation by Newton method
JF(w)*w = -F(w)  w = [JF(w)]-1[-F(w)]
F ( w )
JF (w )   E{zz T g (w T z )}  I
w

E{zzT}E{g' (wTz)} = E{g' (wTz)} I


www  F[E{zg(w E{g(ww]
(w ) *[1 /(z)} T /[E{g'(w z)} ]
T T
z )}   )]I
 [ E{zg (Tw T z )}Multiply {g(-E{g
 w ] /[ Eby '(wTz)}
w T z )} ]
diagonalize JF (w )  [ E{g (w z )}   ]I [ JF (w )]1  [1 /( E{g (w T z )}   )]I
w  E{zg(wTz)}  E{g '(wTz)} w
w  w/||w||
Fixed-point algorithm using negentropy
w  E{zg(wTz)}  E{g '(wTz)} w
w  w/||w||

Convergence : |<wk+1 , wk>|=1


Because ICs can be defined only up to a multiplication sign

1
G1 ( y )  log cosh a1 y, 1  a1  2 g1 ( y )  tanh a1 y g1 ( y )  a1[1  tanh 2 (a1 y )]
a1
G2 ( y )   exp( y 2 / 2) g 2 ( y )  y exp( y 2 / 2) g 2 ( y )  (1  y 2 ) exp( y 2 / 2)
y4
G3 ( y )  g3 ( y)  y 3 g 3 ( y )  3 y 2
4

You might also like