Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 16

Independent Component

Analysis
For Time Series Separation

Ahtasham Ashraf
ICA
Blind Signal Separation (BSS) or Independent Component Analysis (ICA) is the
identification & separation of mixtures of sources with little prior
information.
• Applications include:

– Audio Processing
– Medical data
– Finance
– Array processing (beamforming)
– Coding
• … and most applications where Factor Analysis and PCA is currently used.
• While PCA seeks directions that represents data best in a Σ|x0 - x|2 sense,
ICA seeks such directions that are most independent from each other.
We will concentrate on Time Series separation of Multiple Targets
The simple “Cocktail Party” Problem

Mixing matrix A

s1 x1

Observations
Sources
x2
s2
x = As

n sources, m=n observations


Motivation

Two Independent Sources Mixture at two Mics

x1 (t )  a11 s1  a12 s2
x2 (t )  a21s1  a22 s2
aIJ ... Depend on the distances of the microphones from the speakers
Motivation

Get the Independent Signals out of the Mixture


ICA Model (Noise Free)
• Use statistical “latent variables“ system
• Random variable sk instead of time signal
• xj = aj1s1 + aj2s2 + .. + ajnsn, for all j
x = As
• IC‘s s are latent variables & are unknown AND Mixing matrix A is
also unknown
• Task: estimate A and s using only the observeable random vector x
• Lets assume that no. of IC‘s = no of observable mixtures
and A is square and invertible
• So after estimating A, we can compute W=A-1 and hence
s = Wx = A-1x
Illustration
2 IC‘s with distribution:

 1 if | si | 3 
p ( si )   2 3 
 0 otherwise 
Zero mean and variance equal to 1

Mixing matrix A is

 2 3
A   
 2 1
The edges of the parallelogram are in the
direction of the cols of A
So if we can Est joint pdf of x1 & x2 and then
locating the edges, we can Est A.
Restrictions
• si are statistically independent
– p(s1,s2) = p(s1)p(s2)
• Nongaussian distributions
– The joint density of unit
variance s1 & s2 is symmetric.
So it doesn‘t contain any
information about the
directions of the cols of the
mixing matrix A. So A cann‘t
be estimated.
– If only one IC is gaussian, the 1  x12  x22 
estimation is still possible.
p ( x1 , x2 )  exp  
2  2 
Ambiguities
• Can‘t determine the variances (energies)
of the IC‘s
– Both s & A are unknowns, any scalar multiple in one of the
sources can always be cancelled by dividing the corresponding
col of A by it.
– Fix magnitudes of IC‘s assuming unit variance: E{si2} = 1
– Only ambiguity of sign remains

• Can‘t determine the order of the IC‘s


– Terms can be freely changed, because both s and A are
unknown. So we can call any IC as the first one.
ICA Principal (Non-Gaussian is Independent)
• Key to estimating A is non-gaussianity
• The distribution of a sum of independent random variables tends toward a Gaussian
distribution. (By CLT)

f(s1) f(s2) f(x1) = f(s1 +s2)


• Where w is one of the rows of matrix W.
y  w x  w As  z s
T T T

• y is a linear combination of si, with weights given by zi.


• Since sum of two indep r.v. is more gaussian than individual r.v., so z Ts is more
gaussian than either of si. AND becomes least gaussian when its equal to one of s i.
• So we could take w as a vector which maximizes the non-gaussianity of wTx.
• Such a w would correspond to a z with only one non zero comp. So we get back the si.
Measures of Non-Gaussianity
• We need to have a quantitative measure of non-gaussianity for ICA
Estimation.
• Kurtotis : gauss=0 (sensitive to outliers)
kurt ( y )  E{ y 4 }  3( E{ y 2 }) 2
• Entropy : gauss=largest
H ( y )    f ( y ) log f ( y )dy
• Neg-entropy : gauss = 0 (difficult to estimate)
J ( y )  H ( y gauss )  H ( y )
• Approximations
  2
J ( y )  1 E y 2  1 kurt ( y ) 2
12 48
J ( y )   E G ( y )  E G (v) 
2

• where v is a standard gaussian random variable and :


G ( y )  1 log cosh(a. y )
a
G ( y )   exp( a.u 2 / 2)
Data Centering & Whitening
• Centering
x = x‘ – E{x‘}
– But this doesn‘t mean that ICA cannt estimate the mean, but it just simplifies the
Alg.
– IC‘s are also zero mean because of:
E{s} = WE{x}
– After ICA, add W.E{x‘} to zero mean IC‘s
• Whitening
– We transform the x’s linearly so that the x ~ are white. Its done by EVD.
x~ = (ED-1/2ET)x = ED-1/2ET Ax = A~s
where E{xx~} = EDET
So we have to Estimate Orthonormal Matrix A~
– An orthonormal matrix has n(n-1)/2 degrees of freedom. So for large dim A we
have to est only half as much parameters. This greatly simplifies ICA.
• Reducing dim of data (choosing dominant Eig) while doing whitening also
help.
Noisy ICA Model
x = As + n
• A ... mxn mixing matrix
• s ... n-dimensional vector of IC‘s
• n ... m-dimensional random noise vector
• Same assumptions as for noise-free model, if we use measures of
nongaussianity which are immune to gaussian noise.
• So gaussian moments are used as contrast functions. i.e.
J ( y )   E G ( y )  E G (v) 
2

 
G ( y )  1 / 2 c exp( x 2 / 2c 2 )

• however, in pre-whitening the effect of noise must be taken in to account:


x~= (E{xxT} - Σ)-1/2 x
x~ = Bs + n~.
Simulation Results
I have used the Synthetic data with & without noise to separate the
time series of DW & AAV which are moving fairly close to each
other
Simulation Results
References
• Feature extraction (Images, Video)
– http://hlab.phys.rug.nl/demos/ica/
• Aapo Hyvarinen: ICA (1999)
– http://www.cis.hut.fi/aapo/papers/NCS99web/node11.html
• ICA demo step-by-step
– http://www.cis.hut.fi/projects/ica/icademo/
• Lots of links
– http://sound.media.mit.edu/~paris/ica.html
• object-based audio capture demos
– http://www.media.mit.edu/~westner/sepdemo.html
• Demo for BBS with „CoBliSS“ (wav-files)
– http://www.esp.ele.tue.nl/onderzoek/daniels/BSS.html
• Tomas Zeman‘s page on BSS research
– http://ica.fun-thom.misto.cz/page3.html
• Virtual Laboratories in Probability and Statistics
– http://www.math.uah.edu/stat/index.html

You might also like