Professional Documents
Culture Documents
Gini SKK Rev Final Corrected 2021
Gini SKK Rev Final Corrected 2021
net/publication/350784414
CITATIONS READS
5 254
3 authors:
Sreelakshmi Narayan
Indian Institute of Technology Madras
11 PUBLICATIONS 33 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Sudheesh k k on 27 May 2021.
1. Introduction
Langel and Tille (2013) have done an exhaustive literature survey of the
topic and showed that the same results, as well as the same errors, have
been republished several times, often with a clear lack of reference to the
previous works. Yitzhaki and Schichtman (2013) discussed the developments
of GMD, concentration ratio, Gini parameters of correlation and regression
coefficients.
The Gini index finds applications not only in Economics, but also in
Survival analysis. Tse (2006) developed non-parametric estimators of the
Lorenz curve and the Gini index when observations are left truncated and
right censored. Bonetti, Gigliarano and Muliere (2009) used a restricted
estimator (under right censoring) of a version of the Gini index to compare
the distributions of survival times across group of patients in clinical studies.
They pointed out that their test was a competitor for classical log-rank test,
Wilcoxon test and the Gray and Tsiatis test [see Harrington and Fleming
(1982) and Gray and Tsiatis (1989)] for comparing the equality of distri-
bution of survival times between group of patients under the assumption of
positive cured rate. Jenkins, Birkhauser, Feng and Larrimore (2011) used
multiple imputation method for the estimation of income inequality mea-
sures in the right censored set up. Recently, Lv, Zhang and Ren (2017)
proposed an estimator of the Gini index when right censored observations
are present in the sample. The estimator obtained by Lv et al. (2017) was
computationally complex and has a complicated variance expression. Mo-
tivated by this, we obtain a simple estimator of the Gini index when the
sample contains right censored observations. The variance of the proposed
estimator has a simple expression which can be estimated consistently.
The rest of the article is organised as follows. In Section 2, we review
estimators of the Gini index for complete data. In Section 3, we obtain
a simple non-parametric estimator of the Gini index when right censored
3
In the past two decades, several estimators were proposed for the Gini index.
This was achieved by expressing the Gini index in different forms and then
proposing a natural plug-in estimator or an estimator based on U-statistics.
The main objective of these studies was to obtain a simple reliable estima-
tor with small bias and standard error. Davidson (2009) discussed several
aspects of this problem rigorously. We use the bias corrected version of the
Davidson’s estimator to find the estimator of the Gini index when the sam-
ple contain right censored observations. Hence, we discuss the Davidson’s
estimator of the Gini index for complete data.
Let X be a non-negative continuous random variable with distribution
function F . Assume that X has finite mean µ. Let X1 and X2 be two
independent random variables with same distribution function F . The GMD
is defined as
GM D = E|X1 − X2 |.
GM D
G= . (1)
2µ
where X(i) , i = 1, ..., n are the order statistics based on a random sample
X1 , . . . , Xn from F and X̄ = n1 ni=1 Xi . By simple algebraic manipulation,
P
As pointed out earlier, the main objective of Davidson’s work was to obtain
a simple reliable estimator with less bias and standard error with respect
to its competitor. Accordingly, Davidson (2009) proposed a bias corrected
version of G̃ which is given by
Pn
nG̃ − n − 1)X(i)
i=1 (2i
G
b= = . (4)
(n − 1) ni=1 Xi
P
(n − 1)
the limit theorems of U-statistics (Lee, 1990) and Slutsky’s theorem. The
results are stated for completeness.
Theorem 1. As n → ∞, G
b converges in probability to G.
√ b
Theorem 2. As n → ∞, the distribution of n(G − G) is Gaussian with
σ12
mean zero and variance µ2
where
Z X
σ12 = V X(2F̄ (X) − 1) + 2 yF (y) . (6)
0
In this section, we propose a simple estimator for the Gini index in the
presence of right censored observations. Suppose we have randomly right-
censored observations where the censoring times are independent of the life-
times. The observed data consist of n independent and identical copies of
(Y, δ), with Y = min(X, C), where C is the censoring random variable and
δ = I(X ≤ C), is the censoring indicator. Observe that δi = 1 would
mean that i-th object is not censored, whereas δi = 0 indicates that i-th
object is censored by Ci , on the right. We are interested to find an esti-
mator of the Gini index based on n independent and identical observations
{(Yi , δi ), 1 ≤ i ≤ n}. We obtain an estimator of the Gini index under right
censored case by modifying the estimator given in expression (5). Denote K̄
as the survival function of C. We use inverse probability weighted censoring
approach to find an estimator of G. Following Datta et al. (2010), a version
of ∆
b appropriate for censored data is given by
n n
2 X X h(Yi , Yj )δi δj
∆
bc = , (7)
n(n − 1)
i=1 j<i;j=1 K(Yi −)K(Yj −)
b b
n
1 X Yi δi
X̄c = . (8)
n b i −)
K(Y
i=1
Hence an estimator of the Gini index with right censored observations is
given by
b c = ∆c .
b
G (9)
2X̄c
Now we study the asymptotic properties of G
b c . In the next theorem, we
Theorem 3. As n → ∞, G
b c converges in probability to G.
Proof: Consider
n n
2 X X h(Yi , Yj )δi δj
∆
bc =
n(n − 1) i=1 j<i;j=1 K(Yi −)K(Y
b b j −)
n n b i −) − K̄(Yi ))(K(Y
b j −) − K̄(Yj ))
2 X X h(Yi , Yj )δi δj (K(Y
=
n(n − 1) i=1 j<i;j=1 K(Yi −)K(Yj −)K̄(Yi )K̄(Yj )
b b
n n n n
2 X X h(Yi , Yj )δi δj 2 X X h(Yi , Yj )δi δj
+ +
n(n − 1) i=1 j<i;j=1 K(Y
b i −)K̄(Yj ) n(n − 1)
i=1 j<i;j=1 K̄(Yi )K(Yj −)
b
n n
2 X X h(Yi , Yj )δi δj
−
n(n − 1) i=1 j<i;j=1 K̄(Yi )K̄(Yj )
= ∆
b 1c + ∆ b 3c − ∆
b 2c + ∆ b 4c . (10)
|∆
b 1c | ≤ sup |(K(Y
b i −) − K̄(Yi ))| sup |(K(Y
b j −) − K̄(Yj ))|
Yi Yj
n n
2 X X h(Yi , Yj )δi δj
| |
n(n − 1) K(Y
b i −) b j −)K̄(Yi )K̄(Yj )
K(Y
i=1 j<i;j=1
Observe that the third term in the Eq. (12) is a U -statistic with kernel
h(Yi ,Yj )δi δj
2 , hence it is Op (1) for large n. Similar lines as above we can
K̄ (Y )K̄ 2 (Y )
i j
show that
n n b i −) − K̄(Yi ))
2 X X h(Yi , Yj )δi δj (K(Y
∆
b 2c = ∆
b 4c +
n(n − 1) b i −)K̄(Yj )
K̄(Yi )K(Y
i=1 j<i;j=1
= ∆
b 4c + op (1). (13)
Also
n n b j −) − K̄(Yj ))
2 X X h(Yi , Yj )δi δj (K(Y
∆
b 3c = ∆
b 4c +
n(n − 1) b j −)K̄(Yj )
K̄(Yi )K(Y
i=1 j<i;j=1
= ∆
b 4c + op (1). (14)
∆
bc = ∆
b 4c + op (1).
b c = ∆c 2µ ∆ .
b
∆
∆ X̄c 2µ
where h1 (y) = E(h(Y1 , Y2 )|Y1 = y). The proof of the next theorem follows
from Datta et al. (2010) with choice of the kernel
h(Y1 , Y2 ) = 2 max(Y1 , Y2 ) − Y1 − Y2 .
Theorem 4. Assume E(h2 (Y1 , Y2 )) < ∞, bh21 (x) dHc (x) < ∞ and
R
K (x−)
R∞ 2 √ b
0 w (t)λ c (t)dt < ∞. As n → ∞, the distribution of n(∆ c − ∆) is
2 , where σ 2 is given by
Gaussian with mean zero and variance 4σ1c 1c
h (X)δ Z
2 1 1
σ1c = V ar + w(t)dM1c (t) . (17)
b −)
K(Y
2 . We can estimate σ 2 by
Next, we find a consistent estimator of σ1c 1c
n
2 4 X
σ
b1c = (Vi − V̄ )2 , (18)
n−1
i=1
where
n
h1 (Xi )δi
b X b i )I(Xi > Xj )(1 − δi )
w(X
Vi = b i )(1 − δi ) −
+ w(X Pn ,
K(X i=1 I(Xi > Xj )
b i)
j=1
n n n
1X 1 X h(X, Yi )δi 1X
V̄ = Vi , h1 (X) =
b , Y (t) = I(Yi > t)
n n b i −)
K(Y n
i=1 i=1 i=1
and
n
1 Xbh1 (Xi )δi
w(t)
b = I(Xi > t).
Y (t) K(X
b i)
i=1
.
b c − Zα/2 σ
(G bc , G
b c + Zα/2 σ
bc ),
4. Simulation study
out using R and is repeated ten thousand times. We consider the situation
with approximately 20% of the observations as censored. We compare the
bias and MSE of G
b c with that of the estimators proposed by Bonetti et al.
(2009) and Lv et al. (2017). Here we use the unrestricted version of the
estimator of the Gini index given by Bonetti et al. (2009). We also evaluate
the coverage probability and the average width of the proposed confidence
intervals for G.
We first generate random sample from exponential distribution with cu-
mulative distribution function F (x) = 1 − exp(−x), x ≥ 0. Censored
observations are generated from exponential distribution with parameter
γ = 0.25 to ensure that the sample contains 20% censored observations.
Note that the value of the Gini index for standard exponential distribu-
tion is 0.5. We find the bias and the MSE of G
b c for different sample sizes
n = 50, 75, 100, 150, 200. The results of the simulation study are reported in
10
Table 1. In all the tables, we reported MSE×10 as MSE is very low. From
Table 1, we observe that the bias and MSE of G
b c is very small in comparison
G
bc Lv et al. (2017) Bonetti et al. (2009)
Sample size Bias MSE Bias MSE Bias MSE
50 0.0173 0.0030 0.0697 0.0486 0.0286 0.0081
75 0.0146 0.0021 0.0500 0.0250 0.0254 0.0064
100 0.0117 0.0013 0.0394 0.0155 0.0213 0.0045
150 0.0078 0.0006 0.0262 0.0068 0.0156 0.0024
200 0.0073 0.0005 0.0216 0.0046 0.0133 0.0017
Finally we generate data from log normal distribution with mean zero and
variance 0.5. The true value of the Gini index in this case is 0.2763. Censored
observations are simulated from exponential distribution with parameter
λ = 0.20 to obtain approximately 20% censored observations. The results
of the simulation study are presented in Table 3. In this case also our
estimator performs well as compared to that of Bonetti et al. (2009) and
Lv et al. (2017). From Tables 1-3, we observe that the bias and MSE of
the estimator proposed by Bonetti et al. (2009) are lesser than that of the
estimator obtained by Lv et al. (2017).
11
G
bc Lv et al. (2017) Bonetti et al. (2009)
Sample size Bias MSE Bias MSE Bias MSE
50 0.0675 0.0456 0.1138 0.1296 0.0793 0.0629
75 0.0641 0.0411 0.0970 0.0941 0.0706 0.0498
100 0.0590 0.0348 0.0844 0.0713 0.0637 0.0405
150 0.0519 0.0270 0.0707 0.0500 0.0534 0.0285
200 0.0477 0.0228 0.0631 0.0399 0.0473 0.0223
G
bc Lv et al. (2017) Bonetti et al. (2009)
Sample size Bias MSE Bias MSE Bias MSE
50 0.0062 0.0004 0.0430 0.0185 0.0186 0.0034
75 0.0047 0.0002 0.0281 0.0079 0.0167 0.0028
100 0.0043 0.0002 0.0230 0.0053 0.0146 0.0021
150 0.0029 0.0001 0.0153 0.0023 0.0104 0.0010
200 0.0026 0.0001 0.0122 0.0015 0.0095 0.0009
have considered. Apart from this empirical evidence we were not able to
identify any reason for the difference in performance between the proposed
estimator and the estimator in Bonetti et al. (2009) and in Lv et al. (2017).
The program for the simulation study is available online as supplementary
material.
5. Concluding remarks
Acknowledgement
The authors thank the referees for their constructive suggestions which have
resulted in an improved version of the paper.
References
[1] Bonetti M., Gigliarano, C. and Muliere, P. (2009). The Gini concentration test for
survival data. Lifetime Data Analysis, 15, 493–518.
13
[2] Ceriani, L. and Verme, P. (2012). The origins of the Gini index: extracts from Vari-
abilità e Mutabilità (1912) by Corrado Gini. Journal Economic Inequality, 10, 421–
443.
[3] Datta, S., Bandyopadhyay, D. and Satten, G. A. (2010). Inverse probability of cen-
soring weighted U-statistics for right-censored data with an application to testing
hypotheses. Scandinavian Journal of Statistics, 37, 680–700.
[4] Davidson, R. (2009). Reliable inference for the Gini index. Journal of Econometrics,
150, 30–40.
[5] Gray, R. J. and Tsiatis, A. A. (1989). A linear rank test for use when the main
interest is in differences in cure rates. Biometrics, 45, 899–904.
[6] Harrington, D. P., and Fleming, T. R. (1982). A class of rank test procedures for
censored survival data. Biometrika, 69, 553–566.
[7] Jenkins, S. P., Burkhauser, R. V., Feng, S. and Larrimore, J. (2011). Measuring
inequality using censored data: a multiple-imputation approach to estimation and
inference. Journal of the Royal Statistical Society: Series A, 174, 63–81.
[8] Langel, M. and Tille, Y. (2013). Variance estimation of the Gini index: revisiting a
result several time published. Journal of the Royal Statistical Society -Series A, 176,
521-540.
[9] Lee, A. J. (1990). U-statistics: Theory and Practice, CRC press, Boca Raton.
[10] Lehmann, E. L. (1951). Consistency and unbiasedness of certain non-parametric tests.
Annals of Mathematical Statistics, 22, 165-179.
[11] Lv, X., Zhang, G. and Ren, G. (2017). Gini index estimation for lifetime data. Life-
time data analysis, 23, 275–304.
[12] Peng, L. (2011). Empirical likelihood methods for the gini index. Australian and New
Zealand Journal of Statistics, 53, 131–139.
[13] Stute, W. and Wang, J. L. (1993). The strong law under random censorship. Annals
of Statistics, 21, 1591–1607.
[14] Tse, S. M. (2006). Lorenz curve for truncated and censored data. Annals of the
Institute of Statistical Mathematics, 58, 675–686.
[15] Wang, D., Zhao, Y. and Gilmore, D. W. (2016). Jackknife empirical likelihood con-
fidence interval for the Gini index. Statistics & Probability Letters, 110, 289–295.
[16] Yitzhaki, S. and Schechtman, E. (2013). The Gini Methodology, Springer, New York.