Professional Documents
Culture Documents
Statistical Tests For Detecting Granger Causality
Statistical Tests For Detecting Granger Causality
Statistical Tests For Detecting Granger Causality
Abstract—Detection of a causal relationship between two or Gaussian distributed time series, the directed mutual informa-
more sets of data is an important problem across various scien- tion reduces to the logarithm of ratio of the prediction error
tific disciplines. The Granger causality index and its derivatives variances of the time series of interest with and without in-
are important metrics developed and used for this purpose. How-
ever, the test statistics based on these metrics ignore the effect of cluding the purported causing time series into the prediction
practical measurement impairments such as subsampling, addi- model [2]. This quantity, known as the Granger causality in-
tive noise, and finite sample effects. In this paper, we model the dex (GCI) [2], [3], has been applied extensively for the detection
problem of detecting a causal relationship between two time series of causal relationships across numerous research areas such as
as a binary hypothesis test with the null and alternate hypotheses neuroscience [4]–[11], physics [12], [13], climate science [14],
corresponding to the absence and presence of a causal relationship,
respectively. We derive the distribution of the test statistic under the econometrics [15], etc. Most of the present studies on Granger
two hypotheses and show that measurement impairments can lead causality assume that the sampling process is noiseless and the
to suppression of a causal relationship between the signals, as well sampling frequency is above the Nyquist rate. As a consequence,
as false detection of a causal relationship, where there is none. We the tests are not designed to achieve a given false alarm/detection
also use the derived results to propose two alternative test statistics rate. It is also typically assumed that the second-order statis-
for causality detection. These detectors are analytically tractable,
which allows us to design the detection threshold and determine tics of the signals of interest are perfectly known. However,
the number of samples required to achieve a given missed detection in practice, the causality detector has to estimate the second-
and false alarm rate. Finally, we validate the derived results using order statistics from a finite number of possibly undersampled
extensive Monte Carlo simulations as well as experiments based and noisy measurements of the signals of interest. In this pa-
on real-world data, and illustrate the dependence of detection per- per, our main focus is on the quantification of the effects of
formance of the conventional and proposed causality detectors on
parameters such as the additive noise variance and the strength of these measurement inaccuracies on the performance of Granger
the causal relationship. causality detectors. This, in turn, allows us to design the detector
to achieve a desired false alarm/detection probability.
Index Terms—Causality detection, granger causality, perfor-
mance analysis, hypothesis testing.
The phenomenon of undersampling arises mainly due to the
inability of the signal acquisition unit to sample the signal of
interest above the Nyquist rate. As shown later in the paper,
I. INTRODUCTION under-sampling can lead to a weakening or even a complete
A. Motivation suppression of the causal relationship between two signals. In
addition to this, additive noise can lead to suppression of an
ONFIRMING the presence or absence of a causative re-
C lationship between different observed phenomena is an
important problem across various disciplines of physical, bio-
existing causal relationship, as well as may cause a false de-
tection [16], [17]. The effect of these inaccuracies is further
compounded by the fact that the second-order statistics of the
logical, and social sciences. It has been argued that if the phe- signal of interest, required to compute the GCI, are not known,
nomena of interest can be represented as a time series, then and have to be estimated using the acquired samples. Therefore,
the corresponding relationship between them can be quantified finite sample effects, in addition to undersampling and addi-
by directed mutual information [1]. It is also known that for tive noise, also contribute to errors in the detection of a causal
relationship between the signals of interest. In this paper, we
Manuscript received December 7, 2017; revised May 21, 2018, July 4, 2018, model the problem of detection of Granger causality as a binary
and August 12, 2018; accepted September 14, 2018. Date of publication Septem-
ber 26, 2018; date of current version October 8, 2018. The associate editor
hypothesis test, and evaluate the effects of the measurement im-
coordinating the review of this manuscript and approving it for publication was pairments listed above on the probabilities of detection and false
Dr. Paolo Braca. The work of C. R. Murthy was supported in part by a re- alarm.
search grant from the Aerospace Network Research Consortium and in part
by the Young Faculty Research Fellowship from the Ministry of Electronics
and Information Technology, Government of India. The work of G. Rangarajan B. Related Work
was supported in part by research grants from J. C. Bose National Fellowship,
DST Centre for Mathematical Biology Phase II, and in part by UGC Centre for In the original formulation of Granger causality in [2], a
Advanced Study. (Corresponding author: Chandra Ramabhadra Murthy). time series y[n] is said to Granger cause another time series
R. Chopra is with the Indian Institute of Technology Guwahati, Assam
781039, India (e-mail:,ribhu@outlook.com). x[n], when the past samples of the y[n] can be used to improve
C. R. Murthy and G. Rangarajan are with the Indian Institute of Science, the estimate of the x[n] over ‘what can be predicted using all
Bangalore 560012, India (e-mail:,cmurthy@iisc.ac.in; rangaraj@iisc.ac.in). other information in the universe’, conventionally considered
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. to be the past samples of x[n]. Therefore, a causal relation-
Digital Object Identifier 10.1109/TSP.2018.2872004 ship between x[n] and y[n] can be inferred by comparing the
1053-587X © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
5804 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 22, NOVEMBER 15, 2018
mean squared prediction error for x[n] with and without in- binary hypothesis testing framework for causality detection to
cluding y[n] in the prediction model [18]. The prediction model GCI.
considering only the past samples of x[n] is known as the re-
strictive model (R-model), whereas the one including the ef-
C. Contributions
fects of y[n] is known as the unrestricted model (U-model). In
addition to the GCI, several metrics for quantifying Granger In the present work, we consider the problem of detecting
causality have been studied [19], [20]. The system model with a causal relationship between two possibly sub-sampled time
two time series [2] has been also generalized to study causal series, in the presence of additive white Gaussian noise. We
relationships between more than two time series [21]. However, model the time series as two independent pth order autoregres-
all these studies assume the samples are noiseless, and ignore sive (AR(p)) processes in the absence of a causal relation, and as
the finite sample effects on the second-order statistics being a bivariate vector autoregressive (VAR(p)) process in the pres-
estimated. ence of a causal relationship. We model the test for causality
The effect of noise on Granger causality was first discussed as a binary hypothesis testing problem with the null hypothe-
in [22], and was mathematically analyzed in [16] for a bivariate sis corresponding to the absence of a causal relationship, and
VAR-1 process. These results were extended to a generalized the alternate hypothesis to its presence. The system model is
VAR(p) (pth order autoregressive process) in [17], where a described in detail in Section II. Following this, the main ob-
Kalman filtering based technique to alleviate the detrimental jective of this paper is to evaluate the detection performance of
effect of noise was also developed. An alternative test statistic GCI for finite samples of the two time series of interest. Our
for causality detection, known as the phase slope index, was contributions in this direction are as follows:
introduced in [23] and was shown to be more resilient to noise, 1) We derive the effects of down-sampling on the causal
as compared to the GCI. Another metric for causality, known as relationship between two signals. (See Section III.)
the time reversed Granger causality, has been discussed in [18], 2) We derive the GCI for the down-sampled time series in
[24], [25]. The authors in [26] used simulation techniques to the presence of additive noise assuming knowledge of
compare the robustness of GCI, time reversed Granger causality, the second-order statistics of the signals of interest. (See
and phase slope index to correlated noise, and found that time Section IV.)
reversed Granger causality is the most noise resilient. 3) We derive the statistics of the GCI under the two hypothe-
The problem of sub-sampling in Granger causality detection ses for finite number of samples, and use these to derive
has been studied in the literature [27]–[34]. In [27], [28], the au- the probabilities of detection and false alarm. Our analysis
thors study the effects of the choice of the sampling frequency accounts for the estimation of the second-order statistics
on the detectability of a causal relationship between neurolog- of the two signals using a finite number of samples when
ical signals. In [28], these derived results are used to identify the generative model is unknown. (See Section V.)
favorable sampling rates for detecting Granger causality. In [29], 4) Based on the statistics of GCI, we propose two alter-
two methods, based on the expectation maximization (EM) al- native test statistics to detect Granger causality and de-
gorithm and the variational inference framework, are proposed rive the corresponding probabilities of detection and false
to detect causal relationships from low resolution data, and their alarm. (See Section VI.)
applicability to simulated and practical data is tested. The prob- 5) Using detailed simulations and numerical experiments
lem of causal inference from an under-sampled time series by based on real-world data, we validate the derived the-
using a general purpose Boolean constraint solver has been dis- ory and compare the performance of the two proposed
cussed in [32]. It is shown that the method considered in [32] test statistics against the GCI. (See Section VII.)
is independent of the modeling parameters, and can be conve- The results derived in this paper can be used to characterize
niently scaled to any number of variables. In [33], the authors the behavior of GCI and related detectors under different phys-
propose three sampling rate agnostic algorithms for the recov- ical conditions. In addition to this, the two alternative causality
ery of a causal relationship between observed time series. The detectors discussed in this work are easy-to-use replacements
authors in [34] incorporate the effect of both down-sampling for the GCI. We next describe the system model considered in
and additive noise in their data model. However, none of these this work.
studies consider the effect of finite samples on the detection Notation: Boldface lowercase and uppercase letters repre-
performance, or model the problem of detection of Granger sent vectors and matrices, respectively. The kth column of A
causality as a binary hypothesis testing problem. They also do is denoted by ak . (.)H represents the Hermitian operation on a
not analyze the effects of these physical impairments on the vector or a matrix, and (.)† represents the Moore-Penrose pseu-
probabilities of detection and false alarm. doinverse of a matrix. IK , 0K and OK represent the identity
The authors in [1] consider the related problem of estimation matrix, the all-zero vector and the all-zero matrix of dimension
of directed mutual information between two time series, using a K, respectively. The 2 norm of a vector and the Frobenius
finite number of samples. Here, the test for a causal relationship norm of a matrix are denoted by · 2 and · F , respectively.
between two time series is modeled as a binary hypothesis test, CN (μ, σ 2 ) represents a circularly symmetric complex Gaussian
and it is shown that the probabilities of missed detection and random variable with mean μ and variance σ 2 . E[·] and var(·)
false alarm asymptotically converge to zero as the number of represent the mean and variance of a random variable. In gen-
samples is increased. In this paper, we propose to extend the eral, x̂ and x̃ denote the MMSE estimate and the corresponding
CHOPRA et al.: STATISTICAL TESTS FOR DETECTING GRANGER CAUSALITY 5805
q
q
rcc [τ ] = a∗1,k rcc [τ − k] + a∗2,k rdc [τ − k] + ση2c δ[τ ].
k =1 k =1
(3)
Letting rcd,q [τ ] = [rcd [τ ], . . . , rcd [τ − q + 1]]T , and simi-
larly rcc,q [τ ], etc., the above becomes
2
rcc [τ ] = aH
1 rcc,q [τ − 1] + a2 rdc,q [τ − 1] + ση c δ[τ ].
H
(4)
rcd [τ ] = aH
1 rcd,q [τ − 1] + a2 rdd,q [τ − 1],
H
(5)
and
2
rdd [τ ] = aH
4 rdd,q [τ − 1] + ση d δ[τ ]. (6)
The above equations, along with the information that rcc [0] =
σc2 , and rdd [0] = σd2 can be used to compute rcc [τ ], rcd [τ ], and
rdd [τ ] for different values of τ .
Defining
Rcc,q [τ ] E cq [n]cH
q [n − τ ] , (7)
estimation error of a random variable x. Table I introduces the
different symbols used in this paper. Rcd,q [τ ] E cq [n]dH
q [n − τ ] , (8)
II. SYSTEM MODEL and similarly defining Rdc,p [τ ] and Rdd,q [τ ], it can be shown
using the Weiner-Hopf equations [35] that
We consider two complex valued discrete time zero mean
Gaussian random processes c[n] and d[n] having variances σc2
−1
and σd2 , respectively, and expressed in the form of a bivariate a1 Rcc,q [0] Rcd,q [0] rcc,q [1]
= . (9)
VAR(q) model as a2 Rdc,q [0] Rdd,q [0] rcd,q [1]
q
q
We can therefore calculate the regression coefficients from the
c[n] = a∗1,k c[n − k] + a∗2,k d[n − k] + ηc [n],
correlation coefficients when the latter are not known. The inter-
k =1 k =1
ested reader is referred to [35, Chapter 2] for a detailed derivation
q
q
of the Wiener-Hopf equations. We can then use these to compute
d[n] = a∗3,k c[n − k] + a∗4,k d[n − k] + ηd [n], (1)
the estimation error variance as
k =1 k =1
H
−1
with ai,k , i ∈ {1, 2, 3, 4} being the regression coefficients, and r cc,q [1] R cc,q [0] Rcd,q [0] rcc,q [1]
ηc [n] and ηd [n] the innovation components in c[n] and d[n], ση2c = σc2 − .
rcd,q [1] Rdc,q [0] Rdd,q [0] rcd,q [1]
respectively. We assume that c[n] and d[n] are sampled at the (10)
Nyquist rate, and the temporally white innovation process vector Also, it can be observed from (2) that the causal dependence of
η[n] = [ηc [n] , ηd [n]]T is distributed as c[n] on d[n] is determined by the coefficient vector a2 , that is,
2 there exists a causal relationship only if a2 = 0.
ση c 0
η[n] ∼ CN 0, . Now, in the absence of a causal relationship, c[n] can alter-
0 ση2d
natively be expressed using a univariate AR(q) model,
For simplicity, we assume that only a unidirectional coupling
from d[n] to c[n] can exist, i.e., a3,k = 0 , for 1 ≤ k ≤ q. c[n] = gH cq [n − 1] + ζ[n], (11)
5806 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 22, NOVEMBER 15, 2018
with ζ[n] being the zero mean temporally white complex Gaus- Let ru v [τ ] = E[u[n]v ∗ [n − τ ]], ru v ,p [τ ] = E[up [n]v ∗ [n −
sian innovation component having a variance σζ2 . Consequently, τ ]], and Ru v ,p [τ ] = E[up [n]vpH [n − τ ]], with the absence
of the index [τ ] indicating τ = 0. We can write, ru u [τ ] =
rcc [τ ] = gH rcc,q [τ − 1] + σζ2 δ[τ ], (12) E[c[nM ]c∗ [nM − τ M ]] = rcc [τ M ]. Similarly, ru u ,p [τ ] =
g = R−1 [rcc [τ M ], rcc [(τ + 1)M ], . . . , rcc [(τ + p)M ]]T , and
cc,q [0]rcc,q [1], (13)
2
and ru u [τ ] = bH
1 ru u ,p [τ − 1] + b2 rv u ,p [τ − 1] + ση c δ[τ ], (18)
H
σζ2 = σc2 − rH −1 ru v [τ ] = bH
1 ru v ,p [τ − 1] + b2 rv v ,p [τ − 1],
H
cc,q [1]Rcc,q [0]rcc,q [1]. (14) (19)
2
It is to be noted that in the presence of a causal relationship rv v [τ ] = bH
4 rv v ,p [τ − 1] + ση d δ[τ ], (20)
between c[n] and d[n] the mean squared prediction error for c[n]
using the bivariate model, ση2c , will be smaller than the mean and
squared prediction error using the univariate signal model, σζ2 , Ru u ,p
whereas, in the absence of such a causal relationship between ⎡ ⎤
them, σζ2 = ση2c . rcc [0] rcc [M ]
. . . rcc [(p − 1)M ]
⎢ rcc [M ] . . . rcc [(p − 2)M ] ⎥
rcc [0]
The GCI uses this property to quantify the causal relationship, ⎢ ⎥
=⎢ .. .. .. .. ⎥.
and is defined as [2] ⎣ . . . . ⎦
σζ2 rcc [(p − 1)M ] rcc [(p − 2)M ] . . . rcc [0]
TG log . (15) (21)
ση2c
The weight vectors b1 , b2 , and b4 can then be determined via
In Granger causality, we say that a causal relationship exists
the Wiener-Hopf equations [35] as
between c[n] and d[n] if TG > 0, and no causal relationship
−1
exists if TG = 0. b1 Ru u ,p Ru v ,p ru u ,p [1]
However, both c[n] and d[n] are assumed to be noiseless and = , (22)
b2 Rv u ,p Rv v ,p ru v ,p [1]
sampled at the Nyquist rate. Moreover, it also assumes that the
exact second-order statistics for both c[n] and d[n] are known. and b4 = R−1
v v ,p rv v [1]. (23)
However, these assumptions do not hold in a practical data ac-
quisition system. Therefore, in the subsequent sections, we in- We can then write,
troduce these imperfections, viz. down-sampling, measurement u [n] = u[n] − bH
1 up [n − 1] − b2 vp [n − 1],
H
(24)
noise, and finite sample effects, into a GCI based detector, and
analyze their impact on the detection performance. In the next v [n] = y[n] − bH
4 vp [n − 1]. (25)
section, we discuss the effect of down-sampling on the GCI.
Consequently,
H
−1
H Ru u ,p Ru v ,p ru v ,p [1]
express u[n] and v[n] in the form of a bivariate linear regression − ru u ,p [1] ru v ,p [1]
H
Ru u ,p Ru v ,p Ru v ,p
ficients, and E [u[n − k]∗u [n]] = 0, E [v[n − k]∗v [n]] = 0, × R−1 v v ,p rv v ,p [1].
E [u[n − k]∗v [n]] = 0, and E [v[n − k]∗u [n]] = 0 for k > 0. It Rv u ,p Rv v ,p Rv v ,p
is important to note that the model order p in this case may (28)
be different from the model order q considered in the previous
Hence, the covariance matrix of the innovation process need not
section due to down-sampling.
be diagonal after down-sampling.
Since the GCI is a function of the second-order statistics of
Similarly, the single variable linear regression model for the
c[n] and d[n], we need to derive the second-order statistics of
down-sampled signal u[n] can be written as
u[n] and v[n] in terms of c[n] and d[n] in order to quantify the
effect of down-sampling on the GCI. u[n] = f H up [n − 1] + ξ[n], (29)
CHOPRA et al.: STATISTICAL TESTS FOR DETECTING GRANGER CAUSALITY 5807
u[n] = c[2n] = aηd [2n − 1] + ηc [2n], (32) Substituting equations (18)–(20) into (41)–(43) we obtain
2 2
v[n] = d[2n] = ηd [2n]. (33) rxx [τ ] = bH
1 ru u ,p [τ −1] + b2 rv u ,p [τ −1] + (σ u +σν x )δ[τ ],
H
(44)
Consequently, ru u [1] = rcc [2] = 0, ru v [1] = rcd [2] = 0, ru v [0]
= rcd [0] = 0, and rξ ξ [0] = r x x [0] = σc2 . We see that there ex- rxy [τ ] = bH
1 ru v ,p [τ − 1] + b2 rv v ,p [τ − 1],
H
(45)
ists a causal relationship between c[n] and d[n] for any a = 0, 2 2
ry y [τ ] = bH
4 rv v ,p [τ − 1] + (σ v + σν y )δ[τ ], (46)
but no causal relationship exists between u[n] and v[n]. Thus,
down-sampling can suppress an existing causal relationship be- and similarly,
tween two signals. However, the degree of suppression of the
Rxx,p Rxy ,p
causal relationship depends on the structure of the VAR model Rz z ,p =
followed by the signals. We next analyze the effect of additive Ry x,p Ry y ,p
noise on the performance of the GCI. Ru u ,p Ru v ,p σν2x Ip 0p
= + . (47)
Rv u ,p Rv v ,p 0p σν2y Ip
IV. GCI WITH DOWNSAMPLING AND ADDITIVE NOISE
The optimal weight vector minimizing σϕ2 2 can be obtained
Let us define x[n] and y[n] as the down-sampled signals via the Wiener-Hopf equations as
u[n] and v[n] corrupted by AWGN, such that,
w = R−1
z z ,p rz x,p [1]. (48)
x[n] = u[n] + νx [n],
The minimized value of σϕ2 2 is
y[n] = v[n] + νy [n], (34)
σϕ2 2 = σx2 − rH −1
z x,p [1]Rz z ,p rz x,p [1]. (49)
where νx [n] ∼ CN (0, σν2x ) and νy [n] ∼ CN (0, σν2y ).
Using (17), we can write, However, in case no causal relationship exists between c[n]
and d[n], and consequently between x[n] and y[n], x[n] can be
x[n] bH H
1 b2 up [n − 1] u [n] + νx [n] expressed using the single variable AR model as
= H H + . (35)
y[n] 0p b4 vp [n − 1] v [n] + νy [n]
x[n] = hH xp [n − 1] + ϕ1 [n], (50)
If a causal relationship exists between c[n] and d[n], and is pre-
with ϕ1 [n] being the single variable prediction error, and h being
served in u[n] and v[n] after down-sampling, it should also exist
the optimal weight vector, computed as
between x[n] and y[n], and therefore, the past samples of x[n]
and y[n] can be used to predict x[n] better than its prediction by h = R−1
xx,p rxx,p [1]. (51)
using the past samples of x[n] alone. Letting z[n] = [x[n] y[n]]T
Similar to the two variable case, the minimized value of the
and zp [n] = [xTp [n] ypT [n]]T , we can express x[n] as
prediction error takes the form
x[n] = wH zp [n − 1] + ϕ2 [n], (36) σϕ2 1 = σx2 − rH −1
xx,p [1]Rxx,p rxx,p [1]. (52)
with ϕ2 [n] being the prediction error for the bivariate VAR The GCI therefore becomes
model, and w being the weight vector minimizing the mean
squared value of ϕ2 [n]. σϕ2 1 σx2 − rH −1
xx,p [1]Rxx,p rxx,p [1]
TG = log2 = log2 .
Defining σϕ2 2 E[|ϕ2 [n]|2 ] = E[|x[n] − wH zp [n − 1]|2 ], it σϕ2 2 σx2 − rH −1
z x,p [1]Rz z ,p rz x,p [1]
can be shown that [35], (53)
To demonstrate the effects of noise on the GCI, we again
σϕ2 2 = σx2 − wH rz x,p [1] − rH
z x,p [1]w + w Rz z ,p w,
H
(37)
consider our running example, d[n] = v [n], c[n] = u[n] =
where ad[n − 1] + bd[n − 2] + c [n]. Downsampling these by a fac-
tor M = 2, we obtain, v[n] = v [n], u[n] = bv[n − 1] + u [n],
σx2 = E[x[n]x∗ [n]] = σc2 + σν2x , (38) with, σv2 = σ2v , and σu2 = |b|2 σ2v + σ2u , and consequently,
rz x,p [τ ] = E[zp [n]x∗ [n − τ ]] = [rH
xx [τ ] ry x [τ ]] ,
H H
(39) σx2 = |b|2 σ2v + σ2u + σν2x , σy2 = σ2v + σν2y , (54)
5808 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 22, NOVEMBER 15, 2018
corresponding prediction error variances, and the GCI or an − 2{E[w̃H zp [n − 1]x∗ [n]]}
equivalent test statistic using these samples. In this section, we
first discuss the properties of the optimal predictor weights for − 2{E[w̃H zp [n − 1]zH
p [n − 1]w]}. (64)
the one variable and two variable prediction models. Then, we
It is common in the adaptive filtering literature to assume the
use these weights to calculate the statistics of the finite sample
weight error vector to be independent of the regression vectors
estimates of the corresponding mean square prediction errors,
as well as the desired output [35]. Hence,
and finally derive expressions for the performance of GCI with
finite number of observations. E[w̃H zp [n − 1]x∗ [n]] = E[w̃H zp [n − 1]zH
p [n − 1]w] = 0,
(65)
A. Properties of Optimal Predictor Weights and
It is known that given a finite number of observations, the E[w̃H zp [n − 1]zH
p [n − 1]w̃]
least squares estimate is the best linear unbiased estimator for
x[n] [35]. Letting x[1], . . . , x[N ] be the N available samples = Tr(E[zp [n − 1]zH p [n − 1]E[w̃w̃ ])
H
We can similarly argue, for the one variable model, that ẋ[n] Therefore, ẋ[n] = x̃[n] + x́[n] − h̃H xp [n − 1] and
is also zero mean i.i.d. Gaussian distributed, with
p
p E[t1 ] = σϕ2 2 + σx́2 1 +
E[|ẋ[n]|2 ] = σϕ2 1 1 + . (68) N −p
N −p
p σϕ2 2
Thus, the estimates of the prediction error variances from the = E[t2 ] + σx́2 2 1 + − . (78)
one variable and two variable regression models of the system N −p N −p
can be expressed as Similarly,
2
1 N
1 p
t1 = |ẋ[n]|2 , (69) E[t21 ] 2
= E [t1 ] + (σ 2 + σx́2 )2 1+ , (79)
N − p n =p+1 N − p ϕ2 N −p
and,
1 N
t2 = |x̃[n]|2 . (70) 1 4
N − p n =p+1 var(t1 ) = σ + σx́4 + 2σϕ2 2 σx́2
N − p ϕ2
These can now be used to calculate the GCI as p2 2p
× 1+ +
t1 (N − p)2 N −p
TG = log2 , (71)
t2
1
However, since TG is a logarithm of ratios of random variables, = var(t2 ) + σ 4 + 2σϕ2 2 σx́2
N − p x́
its statistics become hard to determine. Instead, we define TE
2−T G = tt 21 as a test statistic, for simplicity of analysis. 3p2 2p
4
Since x̃[n] and ẋ[n] are the measurement errors for the same − σϕ 2 + . (80)
(N − p)2 N −p
process, t1 and t2 cannot be considered independent. However,
since both t1 and t2 are sums of a large number of i.i.d. random Consequently, it can be shown that
variables, we can use the central limit theorem to approximate
σϕ4 2
these as Gaussian r.v.s [36]. Note that this approximation is valid cov(t1 , t2 ) = . (81)
when the number of samples N is much larger than the model N −p
order p. Now The GCI can therefore be approximated as the ratio of two
correlated Gaussian r.v.s t1 and t2 whose statistics are derived
2p
E[t2 ] = σϕ2 2 1 + , (72) above. In the next subsection, we use the above statistics to
N −p
calculate the probabilities of detection and false alarm.
and
2
1 2p C. Performance of GCI
var(t2 ) = σ4 1+ . (73)
N − p ϕ2 N −p Now, we declare a causal relationship to exist between the
Similarly, two signals if the ratio TE = tt 21 is below a threshold λ. Since
t1 is the sum of non-negative terms, it is almost surely positive,
p
E[t1 ] = σϕ2 1 1+ . (74) and therefore
N −p
Pr{TE < λ} = Pr{λt1 − t2 > 0}. (82)
However, in the presence of a causal relationship between
y[n] and x[n], wH zp [n − 1] will be a better estimate of x[n] Since t1 and t2 are approximated as correlated Gaussian r.v.s,
than hH xp [n − 1], therefore, wH zp [n − 1] can be written as therefore, from (78), TD λt1 − t2 can also be approximated
wH zp [n − 1] = hH xp [n − 1] + x́[n], such that, x́[n] is orthog- as a Gaussian r.v. with mean and variance
onal to hH xp [n − 1]. Consequently, 2 p p
E[TD ] = (λ − 1)E[t2 ] + λσx́ 1 + − λσϕ2 2 ,
ϕ1 [n] = ϕ2 [n] + x́[n], (75) N −p N −p
(83)
with E[x́[n]ϕ∗2 [n]] = 0, and
and
σϕ2 1 = σϕ2 2 + σx́2 , (76)
var(TD ) = λ2 var(t1 ) + var(t2 ) − 2λ cov(t1 , t2 ), (84)
where σx́2 2
var(x́ ).
Now, since x́[n] = wH zp [n] − hH xp [n], it can be shown that respectively.
E[x́[n]] = 0, and, Now, under the null hypothesis, H0 , σϕ2 1 = σϕ2 2 and σx́|H
2
0
=
0. Hence, under H0 , the above simplify to
E[|x́[n]|2 ] = wH Rz z ,p w + hH Rxx,p h − 2{wH Rz x h}
μT D ,0 E[TD |H0 ]
−1 −1
z z ,p Rz z ,p rz z ,p + rxx,p Rxx,p rxx,p
= rH H
p p
−1 −1 = (λ − 1)σϕ2 2 1+ − σϕ2 2 , (85)
− 2{rH
z z ,p Rz z ,p Rz x,p Rxx,p rxx,p }. (77) N −p N −p
5810 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 22, NOVEMBER 15, 2018
σT2 D ,0 var(TD |H0 ) As an example, consider N samples of signals u[n] and v[n]
2 corrupted by AWGN and observed as x[n] and y[n], respectively,
σϕ4 2 2p such that v[n] = v [n] and
= (λ2 + 1) 1+
N −p N −p
u ,0 [n] H0
4 u[n] = , (92)
σϕ 2 3p2 2p σϕ4 2 av [n] + u ,1 [n] H1
− λ2 + − 2λ .
N −p (N − p)2 N −p N −p
(86) with var(v ) = σ2v , var(u ,1 ) = σ2u , and var(u ,0 ) = |a|2 σ2v +
σ2u . Then, K = 1, σϕ2 1 is given by (58), and
Similarly, under the alternate hypothesis, H1 , σϕ2 1 > σϕ2 2 , and ⎧ 2
⎨ σϕ 1 H0
therefore,
σϕ2 2 = 2 2 2 , (93)
⎩ σ2 + σν2 + |a|2 σ v σ2ν y = σϕ2 − Δ H1
p σ +σ 1
μT D ,1 E[TD |H1 ] = λσx́2 1 +
u x u νy
N −p σ ν2 y
with Δ = |a|2 σ2v 1 − σ 2 v +σ ν2 y .
2 p p
+ (λ − 1)σϕ 2 1 + − σϕ2 2 , (87) Therefore,
N −p N −p
2 λ−2
σϕ4 2 2p μT D ,0 = σϕ2 1 λ−1+ , (94)
2 2
σT D ,1 var(TD |H1 ) = (λ + 1) 1+ N −1
N −p N −p
and
2 2 2 2 2
σx́4 p σϕ σ p 1 2
+ λ2 1+ + 2λ2 2 x́ 2 1 + σT2 D ,0 = (λ2 + 1)σϕ4 1 1 +
N −p N −p N −p N −p N −1 N −1
4
σϕ 2 3p2 2p σϕ4 2
− λ2 + − 2λ . 3 2 σϕ2 1
N − p (N − p)2 N −p N −p −λ 2
σϕ4 1 + − 2λ .
(88) (N − 1)2 N −1 N −1
(95)
Based on these, the probability of false alarm can be calcu-
lated as Consequently, the probability of false alarm is given as (96) at
the bottom of this page.
μT D ,0 Under the alternate hypothesis,
PFA = Pr{TD > 0|H0 } = Q . (89)
σT D ,0
2 1
E[TD |H1 ] = (λ − 1)σϕ 1 1 +
Similarly, the probability of correct detection of a causal rela- N −1
tionship, PD , can then be calculated as
1 1
−Δ 1+ − (σϕ2 1 − Δ) ,
μT D ,1 N −1 N −1
PD = Pr{TD > 0|H1 } = Q . (90) (97)
σT D ,1
∗ and
Using (89), a detector providing a given false alarm rate of PFA 2
can be designed by selecting a threshold λ that satisfies 1 2 4 1
var(TD |H1 ) = (λ + 1)σϕ 1 1 +
2 N −1 N −1
p p
σϕ2 2 (λ − 1) 1 + − λσϕ2 2 2 2
N −p N −p 2 2 2 2
+ Δ 1+ − 2σϕ 2 Δ 1 +
2 N −1 N −1
σϕ4 2 p
−1 ∗
= Q (PF A ) (λ + 1)2
1+
N −p N −p 3 2
+ σϕ4 1 + . (98)
(N − 1)2 N −1
4
2 σϕ 2 3p2 2p σϕ4 2
−λ + − 2λ . (91) The probability of detection can thus be expressed as (99) at the
N − p (N − p)2 N −p N −p
bottom of the next page.
The above reduces to a quadratic equation in λ that can be solved We observe from (99) that the probability of detection de-
to obtain the detection threshold for the detector. pends on the ratio σΔ2 , which can in turn be viewed as the
ϕ1
⎛ ⎞
!
⎜ (N − 1) λ − 1 + Nλ−2 −1 ⎟
PFA = Q ⎜
⎝"
⎟.
⎠ (96)
2 σ4
(λ2 + 1)σϕ4 2 1 + N 2−1 − λ2 σϕ4 2 (N −1)
3
2 +
2
N −1 − 2λ N ϕ−1
1
CHOPRA et al.: STATISTICAL TESTS FOR DETECTING GRANGER CAUSALITY 5811
⎛ ⎞
√
⎜ N − 1 (λ − 1)σϕ2 1 1 + N 1−1 − Δ 1 + N 1−1 − (σϕ2 1 − Δ) N 1−1 ⎟
⎜
PD = Q ⎝ " ⎟
2 2 2 ⎠ . (99)
(λ2 + 1)σϕ4 1 1 + N 1−1 + Δ2 1 + N 2−1 − 2σϕ2 2 Δ 1 + N 2−1 + σϕ4 1 (N −1) 3
2 +
2
N −1
5812 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 22, NOVEMBER 15, 2018
and
1 N
TA |x̌[n]|2 , (110)
N − p n =p+1
2σϕ2 2 p σϕ2 1 p Fig. 2. Probabilities of (a) detection and (b) false alarm as a function of the
− 2{rH R−1 −1
z z ,p Rz x Rxx,p rxx,p } + + . (112) detection threshold λ for the test statistic T E .
N −p N −p
However, the higher order statistics and the exact distribution σ u2
of TA cannot be found in closed form. Therefore, we evaluate inspect the effects of noise on the GCI. We define γ1 σ ν2 1 ,
its performance via simulations. To the best of our knowledge, σ u2
and γ2 as the observation SNRs for x[n] and y[n]. We
σ ν2 2
the test statistics TC and TA have not been used in the literature
see that the mean value of TG converges to a value close to zero
to detect Granger causality.
as the noise variance increases. The reduction in the SNR will
make it harder to detect the presence of a causal relationship
VII. SIMULATION RESULTS between the signals of interest.
In this section, we use Monte Carlo simulations to cor-
roborate our analytical expressions and illustrate the relative B. Finite Sample Effects
performance of different test statistics for detecting Granger
We now consider the effects of using a finite number of sam-
causality. We also illustrate the causality detection performance
ples on the GCI, as discussed in Section V. In Figs. 2(a) and 2(b),
of TC using real-world data with known ground truth, available
we compare the simulated detection and false alarm probabil-
in [38]
ity obtained by using the test statistic TE as a function of the
For the simulation-based study, we consider a system model
detection threshold λ and for different number of samples, with
similar to the example discussed in the previous sections, for
a = 0.25 and γ1 = γ2 = 0 dB. The simulated performance of
different values of the variance of the additive noise and vary-
TE is found to be in close agreement with the values computed
ing number of samples. The probabilities of detection and false
using our analytical expressions in (96) and (99). It is also ob-
alarm are obtained by averaging over 10,000 independent real-
served that the behavior of both the probability of detection and
izations of the signals of interest.
false alarm as a function of the detection threshold becomes
sharper with an increase in the number of samples, and asymp-
A. Effect of Additive Noise
totically approximates a step function. The transition point of the
In Fig. 1, we plot the mean value of the test statistic TG for step function occurs at the mean of the test statistic under the re-
the example considered in Section IV, with b = 0.5. Here we spective hypotheses, which corresponds to the 50% probability
CHOPRA et al.: STATISTICAL TESTS FOR DETECTING GRANGER CAUSALITY 5813
Fig. 5. ROC for detection based on T C , with N = 200 samples, for various Fig. 8. ROC of test statistics T E , T C and T A for N = 100 samples at an
values of the SNR. SNR of 0 dB.
Fig. 6. ROC for detection based on T A , with N = 100 samples, for various Fig. 9. Performance of the test statistic T C on the real-world datasets 67–69
values of the SNR. in [38] for different numbers of samples.
VIII. CONCLUSIONS
Fig. 10. Performance of the test statistic T C on the real-world dataset 69 In this work, we derived the effects of different data acquisi-
in [38] for different numbers of samples. tion impairments on the detection of a causal relation between
two signals. We showed that these effects, viz. down-sampling,
additive noise, and finite sample effects, could lead to a signifi-
cant deterioration in the performance of the GCI, both individ-
ually as well as cumulatively. We derived the probabilities of
detection and false alarm for the GCI under the impairments.
Following this, we proposed two alternative test statistics, based
on the difference of mean squared error terms, and the mean
squared value of the difference of error terms. Via extensive
simulations, we showed that the derived results corroborate the
simulated as well as real-world data, with the test statistic based
on the mean squared value of the difference of error terms out-
performing the other two statistics. A more precise analysis of
this test statistic is an interesting direction for future research.
ACKNOWLEDGMENT
Fig. 11. Performance of T C for the dataset considered in Fig. 10 for N = 200 The authors are grateful to the anonymous reviewers for their
with different amounts of added noise. feedback, which significantly improved the overall presentation,
and in particular the simulation results in the paper.
[9] D. Sampath et al., “A study on fear memory retrieval and REM sleep [36] R. Tandra and A. Sahai, “SNR walls for signal detection,” IEEE J. Sel.
in maternal separation and isolation stressed rats,” Behav. Brain Res., Topics Signal Process., vol. 2, no. 1, pp. 4–17, Feb. 2008.
vol. 273, pp. 144–154, 2014. [37] M. K. Simon, Probability Distributions Involving Gaussian Random Vari-
[10] M. Hu and H. Liang, “A copula approach to assessing Granger causality,” ables. New York, NY, USA: Springer, 2002.
NeuroImage, vol. 100, pp. 125–134, 2014. [38] J. M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Schölkopf,
[11] A. K. Seth, A. B. Barrett, and L. Barnett, “Granger causality analysis in “Distinguishing cause from effect using observational data: Methods and
neuroscience and neuroimaging,” J. Neurosci., vol. 35, no. 8, pp. 3293– benchmarks,” J. Mach. Learn. Res., vol. 17, no. 32, pp. 1–102, 2016.
3297, 2015.
[12] R. Ganapathy, G. Rangarajan, and A. K. Sood, “Granger causality and Ribhu Chopra (S’11–M’17) received the B.E. de-
cross recurrence plots in rheochaos,” Phys. Rev. E, vol. 75, Jan. 2007, Art. gree in electronics and communication engineering
no. 016211. from Panjab University, Chandigarh, India, in 2009,
[13] S. Stramaglia, J. M. Cortes, and D. Marinazzo, “Synergy and redundancy and the M.Tech. and Ph.D. degrees in electronics
in the Granger causal analysis of dynamical networks,” New J. Phys., and communication engineering from the Indian In-
vol. 16, no. 10, 2014, Art. no. 105003. stitute of Technology Roorkee, Roorkee, India, in
[14] E. Kodra, S. Chatterjee, and A. R. Ganguly, “Exploring Granger causal- 2011 and 2016, respectively. He worked as a Project
ity between global average observed time series of carbon dioxide and Associate with the Department of Electrical Com-
temperature,” Theor. Appl. Climatol., vol. 104, no. 3, pp. 325–335, 2011. munication Engineering, Indian Institute of Science,
[15] H. White and D. Pettenuzzo, “Granger causality, exogeneity, cointegration, Bangalore, from August 2015 to May 2016. From
and economic policy analysis,” J. Econometrics, vol. 178, Part 2, pp. 316– May 2016 to March 2017, he worked as an Institute
330, 2014. Research Associate with the Department of Electrical Communication Engi-
[16] H. Nalatore, M. Ding, and G. Rangarajan, “Mitigating the effects of mea- neering, Indian Institute of Science, Bangalore, India. In April 2017, he joined
surement noise on Granger causality,” Phys. Rev. E, vol. 75, Mar. 2007, the Department of Electronics and Electrical Engineering, Indian Institute of
Art. no. 031123. Technology Guwahati, Assam, India. His research interests include statistical
[17] H. Nalatore, S. N, and G. Rangarajan, “Effect of measurement noise on and adaptive signal processing, massive MIMO communications, and cognitive
Granger causality,” Phys. Rev. E, vol. 90, Dec. 2014, Art. no. 062127. communications.
[18] I. Winkler, D. Panknin, D. Bartz, K. R. Mller, and S. Haufe, “Validity of
time reversal for testing Granger causality,” IEEE Trans. Signal Process.,
vol. 64, no. 11, pp. 2746–2760, Jun. 2016. Chandra Ramabhadra Murthy (S’03–M’06–
[19] T. Schreiber, “Measuring information transfer,” Phys. Rev. Lett., vol. 85, SM’11) received the B.Tech. degree in electrical
pp. 461–464, Jul. 2000. engineering from the Indian Institute of Technol-
[20] L. A. Baccalá and K. Sameshima, “Partial directed coherence: A new ogy Madras, Chennai, India, in 1998, the M.S. and
concept in neural structure determination,” Biol. Cybern., vol. 84, no. 6, Ph.D. degrees in electrical and computer engineer-
pp. 463–474, 2001. ing from Purdue University and the University of
[21] E. Siggiridou and D. Kugiumtzis, “Granger causality in multivariate time California, San Diego, USA, in 2000 and 2006, re-
series using a time-ordered restricted vector autoregressive model,” IEEE spectively. From 2000 to 2002, he worked as an
Trans. Signal Process., vol. 64, no. 7, pp. 1759–1773, Apr. 2016. Engineer for Qualcomm, Inc., where he worked on
[22] P. Newbold and N. Davies, “Error mis-specification and spurious regres- WCDMA baseband transceiver design and 802.11b
sions,” Int. Econ. Rev., vol. 19, no. 2, pp. 513–519, 1978. baseband receivers. From August 2006 to August
[23] G. Nolte et al., “Robustly estimating the flow direction of information in 2007, he worked as a Staff Engineer at Beceem Communications, Inc., on
complex physical systems,” Phys. Rev. Lett., vol. 100, Jun. 2008, Art no. advanced receiver architectures for the 802.16e Mobile WiMAX standard. In
234101. September 2007, he joined the Department of Electrical Communication En-
[24] S. Haufe, V. V. Nikulin, and G. Nolte, Alleviating the Influence gineering, Indian Institute of Science, Bangalore, India, where he is currently
of Weak Data Asymmetries on Granger-Causal Analyses. Berlin, working as a Professor.
Heidelberg: Springer, 2012, pp. 25–33. His research interests include the areas of energy harvesting communica-
[25] S. Haufe, V. V. Nikulin, K.-R. Mller, and G. Nolte, “A critical assessment tions, multiuser MIMO systems, and sparse signal recovery techniques applied
of connectivity measures for EEG data: A simulation study,” NeuroImage, to wireless communications. His paper won the best paper award in the Com-
vol. 64, pp. 120–133, 2013. munications Track at NCC 2014 and a paper co-authored with his student won
[26] M. Vinck et al., “How to detect the Granger-causal flow direction in the the student best paper award at the IEEE ICASSP 2018. He has 50+ journal
presence of additive noise?,” NeuroImage, vol. 108, pp. 301–318, 2015. papers and 80+ conference papers to his credit. He was an Associate Edi-
[27] A. K. Seth, P. Chorley, and L. C. Barnett, “Granger causality analysis tor for the IEEE SIGNAL PROCESSING LETTERS during 2012–2016. He is an
of fMRI BOLD signals is invariant to hemodynamic convolution but not elected member of the IEEE SPCOM Technical Committee for the years 2014–
downsampling,” NeuroImage, vol. 65, no. Supplement C, pp. 540–555, 2016, and has been re-elected for the 2017–2019 term. He is the Past Chair
2013. of the IEEE Signal Processing Society, Bangalore Chapter. He is currently
[28] L. Barnett and A. K. Seth, “Detectability of Granger causality for subsam- serving as an Associate Editor for the IEEE TRANSACTIONS ON SIGNAL PRO-
CESSING, the Sadhana Journal, and an Editor for the IEEE TRANSACTIONS ON
pled continuous-time neurophysiological processes,” J. Neurosci. Meth-
ods, vol. 275, no. Supplement C, pp. 93–121, 2017. COMMUNICATIONS.
[29] M. Gong, K. Zhang, B. Schoelkopf, D. Tao, and P. Geiger, “Discovering
temporal causal relations from subsampled data,” in Proc. 32nd Int. Conf.
Mach. Learn., vol. 37, Jul. 2015, 1898–1906.
[30] M. Gong, K. Zhang, B. Schölkopf, C. Glymour, and D. Tao, “Causal Govindan Ranagarajan received the Integrated
discovery from temporally aggregated time series,” in Proc. Conf. Uncer- M.Sc. degree from Birla Institute of Technology and
tainty Artif. Intell., Aug. 2017, pp. 1–10. Science, Pilani, India, and the Ph.D. degree from the
[31] A. Tank, E. B. Fox, and A. Shojaie, “Identifiability of non-Gaussian struc- University of Maryland, College Park, MD, USA. He
tural VAR models for subsampled and mixed frequency time series,” in then was with the Lawrence Berkeley Lab, Univer-
Proc. SIGKDD Workshop Causal Discovery, 2016, pp. 1–7. sity of California, Berkeley, before returning to India
[32] A. Hyttinen, S. Plis, M. Jrvisalo, F. Eberhardt, and D. Danks, “A constraint in 1992. He is currently a Professor of mathematics
optimization approach to causal discovery from subsampled time series with the Indian Institute of Science, Bangalore. He is
data,” Int. J. Approx. Reasoning, vol. 90, pp. 208–225, 2017. also the Chairman of the Division of Interdisciplinary
[33] S. Plis, D. Danks, C. Freeman, and V. Calhoun, “Rate-agnostic (causal) Research.
structure learning,” Adv. Neural Inf. Process. Syst., vol. 28, pp. 3303–3311, His research interests include nonlinear dynamics
2015. and chaos, time series analysis, and brain–machine interface. He is a J. C. Bose
[34] X. Wen, G. Rangarajan, and M. Ding, “Is Granger causality a viable National Fellow. He is also a Fellow of the Indian Academy of Sciences and the
technique for analyzing fMRI data?,” PLOS ONE, vol. 8, pp. 1–11, 2013. National Academy of Sciences, India. He was knighted by the Government of
[35] S. Haykin, Adaptive Filter Theory. London, U.K.: Pearson Education, France: Knight (Chevalier) of the Order of Palms. He was also a Homi Bhabha
2002. Fellow.