Professional Documents
Culture Documents
Properties of The Empirical Characteristic Function and Its Application To Testing For Independence
Properties of The Empirical Characteristic Function and Its Application To Testing For Independence
net/publication/2553737
CITATIONS READS
23 186
1 author:
Noboru Murata
Waseda University
138 PUBLICATIONS 2,857 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Noboru Murata on 24 November 2012.
Noboru Murata
Waseda University
Department of Electrical, Electronics, and Computer Engineering
3-4-1 Ohkubo, Shinjuku, Tokyo 169-8555, JAPAN
19
only observations, i.e. by using empirical distributions. In where denotes expectation with respect to the distribution
practice, the Kullback-Leibler divergence is often used, and . From the well-known property of the Fourier-Stieltjes
in this case entropy or differentials of entropy of the recov- transform, the characteristic function and the distribution
ered signals have to be estimated utilizing such as Gram- correspond one to one. Therefore, closeness of two distri-
Charlier expansion or non-linear function approximations butions can be restated in terms of closeness of their char-
([3] and so on). acteristic functions. Moreover the characteristic function is
There are also some methods using higher order statis- overall bounded, that is
tics in which shape of probability distributions are not ex- % !#"$)! '% & % #"$! %
actly assumed. A method of blind separation by Jutten and
Herault [4] is one of this type. In the simplest case, they
These uniqueness and boundedness are specific properties
assume only the symmetry of the distributions and use the
of the characteristic function. The moment generating func-
condition that
tion which is the Laplace transform of the probability dis-
&"1 1)+
(5) tribution is similar to the characteristic function, but it is
not bounded, and there are some pathological examples of
Also cumulants, in particular kurtosis, are often used in con- distributions which, for instance, have all degrees of mo-
trast functions ([5] and so on). ments, but do not have the moment generating functions.
In another approach, the correlation functions are used Particularly the boundedness is an advantage for stability of
in contrast functions to utilize the time structure of the sig- numerical calculations.
The empirical characteristic function is defined as
nal ([6] and so on). In this case, the assumption on the
*)
the assumption of independence, the correlation function (7)
matrix of the source signal is a diagonal matrix
9 ! , + is an i.i.d. sequence from the dis-
),
.. .. ..
(6)
where
tribution . Obviously, the empirical characteristic func-
.
.
=
.
tion is directly calculated from the empirical distribution,
? We should, however, note that all the calculation is done in
. Therefore,
at any time delay , where
function of the signal
denotes the auto-correlation
a desired matrix . is
the complex domain.
The following two results by Feuerverger and Mureika[7]
are quite important to know the properties of convergence of
a matrix which simultaneously diagonalizes the correlation
function matrices of the recovered signal at any time delays. the empirical characteristic function to the true characteris-
In practical calculation, this strategy is reduced as an ap- tic function.
20
the true characteristic function pointwisely with respect to .
Theorem 4 Let
I be a two-dimensional complex stochas-
Furthermore, the above theorems respectively claim that the tic process:
convergence with respect to is uniform in the wider sense,
IA@ 8 I
= 8 IBDC
and the residual, as a stochastic process, converges weakly
to a complex Gaussian process. Feuerverger and Mureika = , & I & , (19)
have mainly discussed a test for symmetry of the distribu-
As EF. ,
I converges weakly to a zero mean com-
tion, but also mentioned about test for fitness, test for in-
plex Gaussian process
I satisfying I G =2#= I
dependence, applications to estimating parameters, and so
on. In practice, a studentized version is often used and the and
properties of the studentized characteristic function H@ I I
B
3 ) (11) @ , ?= B'@ 8 I, I ?= 8 I 8 I B:
where (20)
( " ! =! ! ( ! We8 have to be careful about the fact that
8 I and
= *)
*) (12) I are not independent, which gives the essential
difference from Feuerverger and Mureika’s theorems. For
are also investigated by Murota and Takeuchi[8]. Similarly, the proof of Theorem 3, the equality
8 I= 8 IA@ 8 I
= 8 IB
the residual converges to a complex Gaussian process.
= @ 8 I= 8 IB =@
= B 8 I
We extend the above result to the case of the empiri-
"!1
, + 8 be mutually8 indepen-
cal characteristic function of two independent random vari-
, I , and I be =
@ = B'@ 8 I= 8 IB:
dent i.i.d. sequences, and
ables. Let
8 I)A@ 8 I= 8
applied. Knowing that
@ 8 I2,
I
= 8 I 8 I B: (28)
H@ 8 IB+
By the strong law of large numbers, the above empirical
characteristic functions converge8 almost surely
characteristic functions
, I and 8 toIthe
point-
true
(29)
I
wisely with respect to and , respectively. We can show the proof of Theorem 4 basically follow Feuerverger and
I consists of only the empirical char-
following two theorems. Mureika’s.
Theorem 3 For , - . , Note that
21
3. APPLICATION TO TESTING FOR the neighborhood, we know that
I
, as a stochastic
INDEPENDENCE process, changes continuously and smoothly (Fig. 5, see
I at one point or several points on a cer-
for example). As discussed in Murota and Takeuchi[8], we
According to the results in the previous section, we con- evaluate
I taking account of the continuity. Since
struct a test statistics with
I almost obeys
IA@ 8 I
= 8 IBDC
tain region of
a Gaussian distribution if is suffi-
ciently large, we can adopt the -norm of
I normal-
ized with the variance
5 I is close to with a certain distance.
Roughly speaking, the procedure for testing consists of eval-
uating how
, I
*I
I 3 I 3
-norm, -norm and Sobolev-norm may be candi-
dates as a distance function, however, evaluation on the whole
is not necessarily plausible, because Theorems 1 and
I (30)
3 claim that the residual converges uniformly only on a
I approaches where the elements of the matrix
are given by
to arbitrarily often as
bounded domain. Actually, the function
I E . and does not converge 2"!$#&% I
&@ '!$#&% I*,)($*&+* I G IB2
uniformly on or . Therefore, at least from the practical
point of view, the distance should be evaluated as
I
I
I , , ($*&+* I 3 I
-
3 !$#&% I
I whose support
is bounded.
!$#&%. 3 I
the -norm is used and I can
with a weight function
"
I
2
% I % be
I @'= '!$#&% I/
As a special case,
decomposed as
>
using a function , ($*&+* I G IB2
whose inverse Fourier transform is . The norm is
then rewritten as
Note that
G I2 =2#= I
holds, and then can be
% *I I %
I asymptotically calculated as
!$#&% I10A@
= B'@ 8
'I
= 8
I B
( H= ! >=1 ($*&*
+
I G I10A@: = % % DB'@: =% 8 I % DB:
= ( H= ! >=1
:> This test statistics obeys the 2
distribution with de-
grees of freedom, hence p-values can be evaluated as, for
example,
In addition, if the function is a probability density func-
743&5&6*@ , I ) & 7 8:9B 5
;
tion, the above distance is interpreted as the
the difference between joint and marginal probability den-
-norm of
743&5&6*@ , I & 9 ;:;9B 5 ;:9
sity functions estimated by the kernel method. As another 743&5&6*@ , I & ; :<B 5 ;:;
special case, Sobolev-norm weighted around the origin is
almost equivalent to the method of using higher order statis- under the null hypothesis that two random variables are mu-
tics (moments and cumulants), because the differential coef- tually independent.
ficients at the origin of the empirical characteristic function
give the information of cumulants.
4. NUMERICAL EXPERIMENTS
In any case, these norms are not plausible in practice,
I
because of the computational cost of the multiple integrals
with respect to and , and weight functions should be care-
4.1. Properties of test statistics
fully chosen with quantitative evaluation of the convergence First, we check the properties of the test statistics and their
= -values discussed in the previous section. In each run, we
using, for instance, cross-validation.
@! > 69B
I . We should
Here we try to reduce the computational cost taking ac-
@1 > 49 B subject to the beta distribution ? ,
generate independent random variables
count of the properties of the statistics
I
and calculate the test statistics ,
I , that is, 4 orvanishes
note that the variance of
I&4 , and on the coordi- . Figures 1 and 2
shows the histograms of the test statistics and = -values made
nate axes of
at the other points. Also from the covariance structure of
is bounded
from
runs. Theoretically the test statistics obeys 2
22
distribution with degrees of freedom and the = -value obeys
. Comparison of experimental functions
uniform distribution on
IA@ 8 I
= 8 IB C
and theoretical cumulative distributions of the test statistics
in Figure 3 supports our analysis. I = 7 7 = 7 7
! + and 16+ and the empirical characteristic
9
test statistics at (1,1)
700
where
600
! 1
functions are calculated from randomly chosen points
500
400
around the origin, but continuously fluctuate toward the
of and . We can see that both of the values are almost
300
marginal region.
200
source signal S
1
100
1
0
0 2 4 6 8 10 12 14 16
0.5
amplitude
0
Fig. 1. Histogram of test statistics.
−0.5
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
tap 4
x 10
source signal S2
p−values at (1,1)
140
1
120
0.5
amplitude
100
0
80
−0.5
60
−1
40 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
tap 4
x 10
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fig. 4. Source signal. (
Fig. 2. Histogram of = -values.
forms)
0.8 0
0.7
−0.1
4
0.6 2 4
0 2
−2 0
−2
0.5 −4 −4
s t
0.4
0.3
0.5
0.2
imaginary part
0.1 0
0
0 2 4 6 8 10 12 14 16 18 20
−0.5
4
2 4
0 2
−2 0
−2
−4 −4
Fig. 3. Experimental and theoretical cumulative distribu- s t
nals
sounds
8 < 7 '
Figure7 4 shows their waveforms ( KHz sampling rate,
secs, points).
Figure 5 shows the real part and the imaginary part of
the scaled residual of the joint and marginal characteristic
Since includes with small amplitude, it simulates
crosstalk in . To detect the crosstalk, we generate mixed
23
signals of sample and the evaluation point
I
are needed as
!
a future work. Also, especially for practical use, it is im-
1 ,
portant to consider the problem in which the observations
are smeared by not negligible noises. In the Gaussian noise
0.8
1993.
0.7
0.6
[6] L. Féty and J. P. Van Uffelen, “New methods for signal
separation,” in Proceedings of 4th International Con-
p−values
0.5
0.4
ference on HF radio systems and techniques, London,
0.3 April 1988, IEE, pp. 226–230.
0.2
24
View publication stats