Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Interval Estimation for the Correlation

Presented by: Daniel Fernando Dı́az Pita


Coefficient
Texto disponible en el hipervinculo Overleaf https://www.overleaf.com/8474397986gcdtsytgmzmf
Therefore, the GPQ for Σ is
Introduction New Confidence Intervals 
Rσx2 Rσxy

RΣ =
Rσxy Rσy2
The correlation coefficient (CC) is a standard measure of
a possible linear association between two continuous ran- Generalized Confidence Interval and the GPQ for the CC ρ is
dom variables. For a bivariate normal distribution, there We propose a generalized confidence interval for the CC. Rσxy
are many types of confidence intervals for the CC, such as Assume that X is an observable random variable. ν = Rρ = q (6)
Z-transformation and maximum likelihood-based intervals. (θ, δ) is a vector of unknown parameters, where θ is a Rσx2 Rσy2
parameter of interest, and δ is a vector of nuisance param-
The following Monte-Carlo algorithm is used to generate
eters. Let χ be the sample space of possible values of X
values of Rρ
and x be an observed value of X . Let R = R(X ; x, ν) be a
STEP1 Calculate the sample mean and covariance matrix
function of (X ; x, ν). R is said to be a generalized pivotal
from the original sample (Xi , Yi )s
quantity (GPQ) if it has the following properties:
STEP2 Generate U, U2|1 and ZB using 2,4
•The probability distribution of R does not depend on un- STEP3 Calculate Rρ using 5,6
known parameters; STEP4 Repeat the above steps for a large number of iter-
•The observed pivotal, robs = R(x; x, ν) , does not depend ations. Here we recommend to repeat K = 10, 000 times
on the nuisance parameter δ. to obtain K = 10, 000 values of Rρ
Let {(Xi , Yi ), i = 1, . . . , n} be a random sample from a The generated values of Rρ can be used to estimate the dis-
bivariate normal distribution with mean µ and covariance tribution of Rρ,Therefore, a (1−α) level GPQ-based confi-
matrix Σ An estimator for (µ, Σ) is: dence interval for the CC can be constructed as (R α2 , R1− α2 )

Empirical Likelihood-based Intervals


   
X̄ 1 SSX SSXY
Existing Confidence Intervals (µ̂, Σ̂) = , ,
Ȳ n − 1 SSXY SSY Empirical Likelihood (EL) was introduced by Owen (1988,
1990). EL is a powerful nonparametric method for con-
Fisher (1915) and Hotelling (1953) derived exact forms of Where SSX, SSY and SSXY are sum of squares of X, Y structing confidence intervals/regions of unknown param-
the density function of the sample correlation coefficient. and XY , respectively.Under the normality assumption, ρ is eters. The main advantages of EL-based methods are:
Harley (1954) derived an asymptotic formula for the den- a function of parameters (σx , σy , σxy ) Define the function they do not need a pivotal statistic, and they do not re-
sity function of the sample correlation coefficient. Win- of parameters of interest as follows: quire prior constraints on the shape of confidence intervals.
terbottom (1979) derived a normalizing transformation of ! To our knowledge, no EL-based confidence intervals have
2
the bivariate-normal correlation coefficient that incorpo- 2 σ xy 2
σ xy been developed for the CC.
rates variance stabilization. (µ, β, σ2|1) ≡ µ, 2 , σy − 2 . (1)
σx σx
Nie et al. (2011) showed that the maximum likelihood es- Note that: 
timator (MLE) r of the correlation coefficient ρ is asymp-

The parameters given in 1 can be estimated by X − µx Y − µy
totically normal with variance var (r ) = (1 − ρ 2 2
) . Then ρ= .
√ p 
SSXY SSXY 2
 σx σy
n(r −ρ) var ((r ) is asymptotically N(0, 1) as n −→ ∞. (µ, B, SSX2|1) ≡ µ̂, , SSY − , E (XY ) − E (x)E (Y )
To reduce the skewness of the distribution of r, Fisher’s SSX SSX =p
[E (X 2) − E (X )2][E (Y 2) − E (Y )2]
(1921) Z-transformation, defined by:
respectively. We can easily verify that
−1 1 1+r SSX Is a smooth function of the mean vector m =
Z (r ) = tanh = log , U = ∼ χ 2
2 1−r 2 n−1, (2) (E (X ), E (Y ), E (X 2), E (Y 2), E (XY )) The empirical log-
σX
can be used to construct a confidence interval for ρ. likelihood ratio for m is a chi-square distribution with 5
degrees of freedom. Therefore, one can use this χ2 dis-
Since Z (r ) has asymptotically a normal distribution with SSX2|1 2 tribution to obtain an EL-based confidence region for m,
1 1+ρ 1 U2|1 = 2 ∼ χ n−2, (3)
mean ζ(p) = log and variance , we have σ2|1 and then find an EL-based confidence interval for the CC.
2 1−ρ n−3 Here we propose a plug-in EL confidence interval for the
that s
SSX CC which can be easily implemented in practice.
√ ZB = (B − β) ∼ N(0, 1). (4) Let Wi = (Xi , Yi ), i = 1, . . . , n. Since the CC ρ satisfies
n − 3 (n + 1)(1 − ρ) d 2
σ2|1
Z= log →
− N(0, 1). the following equation:
2 (1 − r )(1 + ρ)
Let b,ssx2|1,ssx and ssy be the observed values of B, 
X − µ x Y − µy

Therefore, a (1 − α) level Z-transformation-based con- SSX2|1,SSX and SSY , respectively. Then the GPQ for E . −ρ =0
fidence interval (called NAI interval) for the CC can be σx σy
(σx2, σ2|1
2
, β) is
constructed as follows: the EL for ρ can be defined as follows:
s !

B1 − 1 B2−1
 ssx ssx2|1 1 ssx2|1
I = ρ: ≤ρ≤ (Rσx2 , Rσ2|1
2 , Rβ ) = , , b − Zb , ( n n n
)
B1 + 1 B2 + 1 U U2|1 U2|1 ssx Y X X
L0(ρ) = sup pi : pi ≥ 0, pi , pi (V (Wi ) − 1) = 0
(5)
2 2 i=1 i=1 i=1
1 − r − √n − 3 Zα2 1 − r − √n − 3 Zα2 (7)
where B1 = e ,B2 = e , and the GPQ for (σy2) is given by
1+r 1+r
and Zα2 is (1 − α2) − th quantile of the standard normal   where p = (p1, . . . , pn) is a probability vector, and
distribution. This confidence interval works reasonably well (Rσy2 , Rσxy ) = Rβ2Rσx2 + Rσ2|1
2 , Rβ Rσ 2 Xi − µ x Yi − µ y
x
V (Wi ) = . , i = 1, . . . , n.
when n ≥ 30 σx σy

Real Data Analysis

Example 1. Brain Size and Intelligence Example 2. Reading and Mathematics


Willerman et al.(1991) did research on the association between brain size and mental There are many studies about the association between language proficiency and mathe-
capacity. In the study, 40 right-handed introductory psychology students at a southern matical performance.MacGregor and Price (1999) made a short review of the literature
university participated in an experiment. The test score measured full scale IQ score using on the connection between reading comprehension and mathematics performance. They
a Verbal IQ and a Performance IQ score. indicated that the ability to read and comprehend word problems are very important factors
In order to measure the brain size of the subjects, Magnetic Resonance Imaging (MRI) was in determining success in mathematics.
used. The sample correlation coefficient is r = 0.9631, which indicates a strong association be-
The sample correlation coefficient between brain size and full scale IQ is r = 0.3337. Since tween the two variables. The Shapiro-Wilk multivariate normality test gives p-value of
the p-value = 0.9851 by the Shapiro-Wilk multivariate normality test (Shapiro and Wilk, 0.0266. This small p-value suggests the non-normality of the underlying distribution.
1965), we accept the normality assumption for the data. We calculate 95% NAI, GPQ and
IFEL intervals for ρ. The results are displayed in Table 1. Methods Confidence Interval Length
NAI (0.8038, 0.9555) 0.1517
Methods Confidence Interval Length IFEL (0.7808, 0.9680) 0.1872
NAI (0.0157,0.5904) 0.5747
GPQ (0.0088, 0.5845) 0.5757
IFEL (0.0088, 0.5697) 0.5609

References: Xinjie Hu, Aekyung Jung & Gengsheng Qin (2018): Interval Estimation for the Correlation Coefficient, The American Statistician.

You might also like