Properties of The Empirical Characteristic Function and Its Application To Testing For Independence

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/2553737
Properties Of The Empirical Characteristic Function And Its Application To

Testing For Independence
Article · January 2002

Source: CiteSeer
CITATIONS READS
23 186
1 author:
Noboru Murata
Waseda University
138 PUBLICATIONS 2,857 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Sparse Modeling View project
All content following this page was uploaded by Noboru Murata on 24 November 2012.
The user has requested enhancement of the downloaded file.

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION
AND ITS APPLICATION TO TESTING FOR INDEPENDENCE
Noboru Murata
Waseda University
Department of Electrical, Electronics, and Computer Engineering
3-4-1 Ohkubo, Shinjuku, Tokyo 169-8555, JAPAN
ABSTRACT an observation and we assume that it is produced as a linear

In this article, the asymptotic properties of the empirical '
() (*of ,thethatsource
transformation
is
signal by an unknown matrix
' +() *,+---,(* (1)

characteristic function are discussed. The residual of the
joint and marginal empirical characteristic functions is stud-
ied and the uniform convergence of the residual in the wider
and es-
sense and the weak convergence of the scaled residual to a
Gaussian process are investigated. Taking into account of
'
Here the problem is to recover the source signal
timate the matrix from the observation without any knowl-
edge about them, that is, find a matrix . such that the ele-
the result, a statistical test for independence against alterna-
tives is considered. ments of a recovered signal /
&0"1 1*
,
which is determined by a certain matrix . as
1. INTRODUCTION / . (2)
Independent component analysis (ICA,[1]) deals with a prob-

lem of decomposing signals which are generated from an
,
are mutually independent. In other words, find a decompo-
sition of observation
24()3 1 ,5---,6(*3 1

unknown linear mixture of mutually independent sources,
and is applied to many problems in various engineering fields, (3)
such as source separation problem, dimension reduction,
where elements of /
are mutually independent. Ideally
and data mining. ICA is often compared with principle
the matrix . is an inverse (or generalized inverse) of the
component analysis (PCA), which is a method of multivari- '
matrix . However, as well known, the order and amplitude
ate data analysis based on the covariance structure of sig-
nals with the Euclidean metric. However, in the ICA frame- of elements of the recovered signal are ambiguous.
work, the notion of independence, which is not affected by In general, ICA algorithms are implemented as mini-
the metric of signals, plays a central role. Supported by the mization or maximization of a certain contrast function which
recent improvement of computer systems, which allows us
ered signal /
. Motivated by mainly stability of algo-
measures the independence among elements of the recov-
to handle massive data and to estimate computationally te-
dious higher order information within reasonable time, the rithms and speed of the convergence, many contrast func-
concept of ICA gives a new viewpoint of using the mutual 78:9<; ; ; 8:=
> > and 78:?<
> be the joint probability den-
tions are proposed, and typical ones are as follows. Let
information instead of the covariance to the multivariate
data analysis. It is also pointed out that, in the context of sity and the marginal probability density of the recovered
learning, self-organization of information transformations signal, respectively. The definition of independence is stated
based on infomax criteria in neural networks is closely re- as
78:9; ; ; 8:=
> >)+78:9
>*---<78:=
> (4)
lated with ICA ([2], and so on).

Asimple
setting

(of
) be a source signal,
ICA is as follows. Let Therefore, a certain statistical distance between the joint
and the marginal distributions can be a contrast function,
which consists of zero mean independent random vari- such as the Kullback-Leibler divergence and the Hellinger
ables. We assume that each component is an i.i.d. sequence distance. Since these distances are defined with probability
or a strong stationary ergodic sequence, and at most only distributions, we have to approximate probability distribu-
others are non-Gaussian. Let

"!# !%$&
be
one component is subject to a Gaussian distribution and the tions by kernel methods or polynomial expansion based on
moments or cumulants, in order to calculate them by using
19
only observations, i.e. by using empirical distributions. In where denotes expectation with respect to the distribution
practice, the Kullback-Leibler divergence is often used, and . From the well-known property of the Fourier-Stieltjes
in this case entropy or differentials of entropy of the recov- transform, the characteristic function and the distribution
ered signals have to be estimated utilizing such as Gram- correspond one to one. Therefore, closeness of two distri-
Charlier expansion or non-linear function approximations butions can be restated in terms of closeness of their char-
([3] and so on). acteristic functions. Moreover the characteristic function is
There are also some methods using higher order statis- overall bounded, that is
tics in which shape of probability distributions are not ex- % !#"$)! '% & % #"$! %
actly assumed. A method of blind separation by Jutten and
Herault [4] is one of this type. In the simplest case, they
These uniqueness and boundedness are specific properties
assume only the symmetry of the distributions and use the
of the characteristic function. The moment generating func-
condition that
tion which is the Laplace transform of the probability dis-
&"1 1)+

(5) tribution is similar to the characteristic function, but it is
not bounded, and there are some pathological examples of
Also cumulants, in particular kurtosis, are often used in con- distributions which, for instance, have all degrees of mo-
trast functions ([5] and so on). ments, but do not have the moment generating functions.
In another approach, the correlation functions are used Particularly the boundedness is an advantage for stability of
in contrast functions to utilize the time structure of the sig- numerical calculations.
The empirical characteristic function is defined as

nal ([6] and so on). In this case, the assumption on the
* ( #"$!

source signals may be relaxed to be weakly stationary. From
*)
the assumption of independence, the correlation function (7)
matrix of the source signal is a diagonal matrix
9 ! , + is an i.i.d. sequence from the dis-
),
.. .. ..
(6)
where
tribution . Obviously, the empirical characteristic func-
.
.
=
.
tion is directly calculated from the empirical distribution,
? We should, however, note that all the calculation is done in
. Therefore,
at any time delay , where
function of the signal
denotes the auto-correlation
a desired matrix . is
the complex domain.
The following two results by Feuerverger and Mureika[7]
are quite important to know the properties of convergence of
a matrix which simultaneously diagonalizes the correlation
function matrices of the recovered signal at any time delays. the empirical characteristic function to the true characteris-
In practical calculation, this strategy is reduced as an ap- tic function.
Theorem 1 (Feuerverger and Mureika[7]) For ,- . ,

proximated simultaneous diagonalization of several delayed
correlation matrices under the least mean square criterion.
So far based on various contrast functions, many algo- 70/1+42+3 57698 $ % *
= % +?> 6
rithms are proposed. Since each method has its own ad- : ; : < (8)
vantages and disadvantages, in practic it is difficult to know
which algorithm should be used in advance. We sometimes holds.
need to evaluate the validity of the result of an algorithm, or
compare the results of different algorithms from the view- Theorem 2 (Feuerverger and Mureika[7]) Let be a
1*
point of independence of reconstructed signals. Therefore, stochastic process that is a residual of the empirical char-
it is important to consider methods of test for independence. acteristic function and the true characteristic function:
In this paper, we propose to use the empirical characteris- 1)A@ *= BDC = , & & , (9)
tic function, and investigate basic properties with numerical
experiments.
As 0EF. ,
1 converges to a zero mean complex Gaus-
sian process
1# satisfying 1#51HG =2 and
2. PROPERTIES OF EMPIRICAL
CHARACTERISTIC FUNCTION
H@1#<1#IB ),I= I (10)
The characteristic function of a probability distribution is

G
where denotes complex conjugate.
defined as the Fourier-Stieltjes transform of ,
As AEJ. , by the strong law of large numbers, the
!#"$! ) empirical characteristic function converges almost surely to
20
the true characteristic function pointwisely with respect to .
Theorem 4 Let
I be a two-dimensional complex stochas-

Furthermore, the above theorems respectively claim that the tic process:
convergence with respect to is uniform in the wider sense,

IA@ 8 I
= 8 IBDC
and the residual, as a stochastic process, converges weakly
to a complex Gaussian process. Feuerverger and Mureika = , & I & , (19)
have mainly discussed a test for symmetry of the distribu-
As EF. ,
I converges weakly to a zero mean com-
tion, but also mentioned about test for fitness, test for in-
plex Gaussian process
I satisfying I G =2#= I
dependence, applications to estimating parameters, and so
on. In practice, a studentized version is often used and the and
properties of the studentized characteristic function H@ I I B
3 ) (11) @ , ?= B'@ 8 I, I ?= 8 I 8 I B:
where (20)
( " ! =! ! ( ! We8 have to be careful about the fact that
8 I and
= *)

*) (12) I are not independent, which gives the essential
difference from Feuerverger and Mureika’s theorems. For
are also investigated by Murota and Takeuchi[8]. Similarly, the proof of Theorem 3, the equality
8 I= 8 IA@ 8 I
= 8 IB
the residual converges to a complex Gaussian process.

= @ 8 I= 8 IB =@
= B 8 I
We extend the above result to the case of the empiri-
"!1 , + 8 be mutually8 indepen-
cal characteristic function of two independent random vari-
, I , and I be = @ = B'@ 8 I= 8 IB:
dent i.i.d. sequences, and
ables. Let
the characteristic functions of

! and 1 , and their joint
and the continuity theorem of the characteristic function are
distributions, respectively. Note that in the two dimensional
8 I)A@ 8 I= 8
applied. Knowing that
8 I&!#"$! ,I1& IB C

case, the characteristic function is defined as
(21)
!
(13)
8
)A@
= BDC
IA@ 8 8 (22)
1 , it is rewritten as
and from the assumption of independence between and

I= I BDC (23)
8 I) 8 I
H@ 8 I 8 I B
(14) converge pointwisely to Gaussian distributions satisfying
, 8 I2,I = 8 I 8 I

The empirical characteristic functions are defined as
( #"$!

*) (15)
H@ B ),
=
(24)

8 I *( ) #"$I1 8 I 8 I B 8 I2,I = 8 I 8 I
(25)
H@
8 I B
(16) (26)
H@

8 I *( ) #"$! ,I1 @ ),
= B 8 I

(27)
H@ 8 I 8 I B
(17)
@ 8 I2, I
= 8 I 8 I B: (28)
H@ 8 IB+
By the strong law of large numbers, the above empirical
characteristic functions converge8 almost surely
characteristic functions
, I and 8 toIthe
point-
true
(29)
I
wisely with respect to and , respectively. We can show the proof of Theorem 4 basically follow Feuerverger and
I consists of only the empirical char-
following two theorems. Mureika’s.
Theorem 3 For , - . , Note that
70/ 1+42+3 5 96 8 $ % 8 I

= 8 I % +?> *I without any
acteristic functions, which can be calculated from observa-
: ; : :
: < tions. That means we can calculate
knowledge about the underlying distributions of and .
! 1
(18) Also the multidimensional versions for 3 or more ran-
holds. dom variables can be discussed in a straightforward way.
21
3. APPLICATION TO TESTING FOR the neighborhood, we know that
I
, as a stochastic
INDEPENDENCE process, changes continuously and smoothly (Fig. 5, see
I at one point or several points on a cer-
for example). As discussed in Murota and Takeuchi[8], we
According to the results in the previous section, we con- evaluate
I taking account of the continuity. Since
struct a test statistics with
I almost obeys
IA@ 8 I
= 8 IBDC
tain region of
a Gaussian distribution if is suffi-
ciently large, we can adopt the -norm of
I normal-
ized with the variance
5 I is close to with a certain distance.
Roughly speaking, the procedure for testing consists of eval-
uating how
, I
*I
I 3 I 3
-norm, -norm and Sobolev-norm may be candi-
dates as a distance function, however, evaluation on the whole

is not necessarily plausible, because Theorems 1 and

I (30)
3 claim that the residual converges uniformly only on a
I approaches where the elements of the matrix

are given by

to arbitrarily often as

bounded domain. Actually, the function
I E . and does not converge 2"!$#&% I

&@ '!$#&% I*,)($*&+* I G IB2
uniformly on or . Therefore, at least from the practical
point of view, the distance should be evaluated as

I
I I , , ($*&+* I 3 I
-

3 !$#&% I
I whose support
is bounded.
!$#&%. 3 I

the -norm is used and I can
with a weight function

"

I
2
% I % be
I @'= '!$#&% I/
As a special case,
decomposed as

>
using a function , ($*&+* I G IB2
whose inverse Fourier transform is . The norm is
then rewritten as
Note that
G I2 =2#= I

holds, and then can be
% *I I % I asymptotically calculated as
!$#&% I10A@
= B'@ 8
'I
= 8
I B
( H= ! >=1 ($*&*
+

I G I10A@: = % % DB'@: =% 8 I % DB:

= ( H= ! >=1
:> This test statistics obeys the 2

distribution with de-
grees of freedom, hence p-values can be evaluated as, for
example,

In addition, if the function is a probability density func-
743&5&6*@ , I ) & 7 8:9B 5
;
tion, the above distance is interpreted as the
the difference between joint and marginal probability den-
-norm of
743&5&6*@ , I & 9 ;:;9B 5 ;:9
sity functions estimated by the kernel method. As another 743&5&6*@ , I & ; :<B 5 ;:;
special case, Sobolev-norm weighted around the origin is
almost equivalent to the method of using higher order statis- under the null hypothesis that two random variables are mu-
tics (moments and cumulants), because the differential coef- tually independent.
ficients at the origin of the empirical characteristic function
give the information of cumulants.
4. NUMERICAL EXPERIMENTS
In any case, these norms are not plausible in practice,
I
because of the computational cost of the multiple integrals
with respect to and , and weight functions should be care-
4.1. Properties of test statistics
fully chosen with quantitative evaluation of the convergence First, we check the properties of the test statistics and their
= -values discussed in the previous section. In each run, we
using, for instance, cross-validation.
@! > 69B
I . We should
Here we try to reduce the computational cost taking ac-
@1 > 49 B subject to the beta distribution ? ,
generate independent random variables

count of the properties of the statistics

I
and calculate the test statistics ,

I , that is, 4 orvanishes
note that the variance of
I&4 , and on the coordi- . Figures 1 and 2
shows the histograms of the test statistics and = -values made
nate axes of
at the other points. Also from the covariance structure of
is bounded
from
runs. Theoretically the test statistics obeys 2
22

distribution with degrees of freedom and the = -value obeys
. Comparison of experimental functions
uniform distribution on

IA@ 8 I
= 8 IB C
and theoretical cumulative distributions of the test statistics
in Figure 3 supports our analysis. I = 7 7 = 7 7
! + and 16+ and the empirical characteristic
9
test statistics at (1,1)
700
where
600
! 1
functions are calculated from randomly chosen points
500
400
around the origin, but continuously fluctuate toward the
of and . We can see that both of the values are almost
300
marginal region.
200
source signal S
1
100
1
0
0 2 4 6 8 10 12 14 16
0.5
amplitude
0
Fig. 1. Histogram of test statistics.
−0.5
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
tap 4
x 10
source signal S2
p−values at (1,1)
140
1
120
0.5
amplitude
100
0
80
−0.5
60
−1
40 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
tap 4
x 10
8 KHz sampling rate, < secs wave-

20
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fig. 4. Source signal. (
Fig. 2. Histogram of = -values.
forms)
behavior of the residual Zn(t,s)
cumulative distribution of test statistics at (1,1)

1
experimental 0.1
theoretical
0.9
real part
0.8 0
0.7
−0.1
4
0.6 2 4
0 2
−2 0
−2
0.5 −4 −4
s t
0.4
0.3
0.5
0.2
imaginary part
0.1 0
0
0 2 4 6 8 10 12 14 16 18 20
−0.5
4
2 4
0 2
−2 0
−2
−4 −4
Fig. 3. Experimental and theoretical cumulative distribu- s t
tions of the test statistics.

I of the joint and marginal characteristic functions.
Fig. 5. The real part and the imaginary part of the residual

4.2. Application to detecting crosstalk

Using these signals, we generate artificially mixed sig-
and which are synthesized on the computer.
In the following example, we use two music instrument
nals

sounds
8 < 7 '

Figure7 4 shows their waveforms ( KHz sampling rate,

secs, points).

Figure 5 shows the real part and the imaginary part of
the scaled residual of the joint and marginal characteristic

Since includes with small amplitude, it simulates
crosstalk in . To detect the crosstalk, we generate mixed
23
signals of sample and the evaluation point
I
are needed as

!

a future work. Also, especially for practical use, it is im-
1 ,
portant to consider the problem in which the observations
are smeared by not negligible noises. In the Gaussian noise
and calculate the statistics ,

I . In Figures 6 and 7, the case, a straightforward extension can be made by using the
information of the noise covariance estimated by the con-
test statistics and the = -values are shown, which are cal-
9
ventional factor analysis as discussed in Kawanabe[9],
culated from randomly chosen points. We plot the
intensity of in the horizontal axis and the test statistics 6. REFERENCES
=
and the p-values in the vertical axis, respectively, where
varies
I
on

and
the

statistics
: . Theare= -values
evaluated at [1] Pierre Comon, “Independent component analysis, a
test statistics are maximized at

= :<:;#= :< = :of<&; theat new concept?,” Signal Processing, vol. 36, no. 3, pp.
287–314, April 1994.
tect the true crosstalk ratio

with small errors.
the applied points, respectively,7 hence the test statistics de-
[2] Anthony J. Bell and Terrence J. Sejnowski, “An infor-
mation maximization approach to blind separation and
test statistics
90
(0.1,0.1)
blind deconvolution,” Neural Computation, vol. 7, pp.
(0.1,0.2)
80 (0.2,0.1)
1129–1159, 1995.
70
60 [3] Shun-ichi Amari, Andrzej Cichocki, and Harward Hua

50 Yang, “A new learning algorightm for blind signal sep-
T(t,s)
40 aration,” in Advances in Neural Information Processing

30 Systems 8, David S. Touretzky, Michael C. Mozer, and
20 Michael E. Hasselmo, Eds., pp. 757–763. MIT Press,
10 Cambridge MA, 1996.
0
−0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1
α
[4] Christian Jutten and Jeanny Herault, “Separation of
sources, Part I,” Signal Processing, vol. 24, no. 1, pp.
Fig. 6. Test statistics of artificially mixed signals. 1–10, July 1991.
[5] Jean-François Cardoso and Antoine Souloumiac,
p−values of testing for independence
“Blind beamforming for non Gaussian signals,” IEE
1
(0.1,0.1)
(0.1,0.2)
Proceedings F, vol. 140, no. 6, pp. 362–370, December
0.9 (0.2,0.1)
0.8
1993.
0.7
0.6
[6] L. Féty and J. P. Van Uffelen, “New methods for signal
separation,” in Proceedings of 4th International Con-
p−values
0.5
0.4
ference on HF radio systems and techniques, London,
0.3 April 1988, IEE, pp. 226–230.
0.2
0.1 [7] Andrey Feuerverger and Roman A. Mureika, “The em-

0
−0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1
pirical characteristic function and its applications,” The
α
Annals of Statistics, vol. 5, no. 1, pp. 88–97, 1977.

Fig. 7. p-values of artificially mixed signals. [8] Kazuo Murota and Kei Takeuchi, “The studentized em-
pirical characteristic function and its application to test
for the shape of distribution,” Biometrika, vol. 68, no.
1, pp. 55–65, 1981.
5. CONCLUDING REMARKS
[9] Motoaki Kawanabe and Noboru Murata, “Indepen-
In this paper, we investigated the properties of the joint and dent component analysis in the presence of gaussian
marginal empirical characteristic functions, and discussed noise based on estimating functions,” in Proceed-
a test for independence using the characteristic function of ings of the Second International Workshop on Indepen-
joint and marginal empirical distributions. dent Component Analysis and Blind Signal Separation
With more detailed evaluation of the convergence, quan- (ICA2000), Helsinki, June 2000.
titative consideration about the relationship between the size
24
View publication stats

Properties of The Empirical Characteristic Function and Its Application To Testing For Independence

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Properties of The Empirical Characteristic Function and Its Application To Testing For Independence

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Properties Of The Empirical Characteristic Function And Its Application To

Article · January 2002

Sparse Modeling View project

The user has requested enhancement of the downloaded file.

ABSTRACT an observation and we assume that it is produced as a linear

  ' +()  *,+---,(*   (1)

Independent component analysis (ICA,[1]) deals with a prob-

 24()3 1  ,5---,6(*3 1 

others are non-Gaussian. Let 

 *  ( #"$ !

Theorem 1 (Feuerverger and Mureika[7]) For ,- . ,

The characteristic function of a probability distribution  is

the characteristic functions of

 8 I&!#"$ ! ,I1& IB C  

   ,   8 I2,I =        8  I  8 I 

70/  1+42+3 5 96 8 $ %   8 I

8 KHz sampling rate, < secs wave-

behavior of the residual Zn(t,s)

cumulative distribution of test statistics at (1,1)

tions of the test statistics.

4.2. Application to detecting crosstalk

and calculate the statistics ,

test statistics are maximized at

tect the true crosstalk ratio

60 [3] Shun-ichi Amari, Andrzej Cichocki, and Harward Hua

40 aration,” in Advances in Neural Information Processing

0.1 [7] Andrey Feuerverger and Roman A. Mureika, “The em-

Annals of Statistics, vol. 5, no. 1, pp. 88–97, 1977.

You might also like

' +() ,+---,( (1)

24()3 1 ,5---,6(*3 1

others are non-Gaussian. Let

* ( #"$!

Theorem 1 (Feuerverger and Mureika[7]) For ,- . ,

The characteristic function of a probability distribution is

8 I&!#"$! ,I1& IB C

, 8 I2,I = 8 I 8 I

70/ 1+42+3 5 96 8 $ % 8 I

8 KHz sampling rate, < secs wave-