01_ecdf_plugin

Empirical cdfs and the plugin principle
Advanced Statistics II
Prof. Dr. Matei Demetrescu
Statistics and Econometrics (CAU Kiel) Summer 2021 1 / 29

Today’s outline
Empirical cdfs and the plugin principle
1 The empirical cdf
2 Plug-in I: sample moments
3 Plug-in II: sample quantiles and order statistics
4 Up next

The empirical cdf
Outline
1 The empirical cdf
4 Up next

The empirical cdf
Visualizing data
Histogram of data
7
6
5
Frequency
4
3
2
1
0
−3 −2 −1 0 1 2 3
Empirical cdf True cdf
1.0
1.0
0.8
0.8
0.6
0.6
pnorm
Fn(x)
0.4
0.4
0.2
0.2
0.0
0.0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
x x

The empirical cdf
Getting more formal
Definition ((Univariate) Empirical cdf)

Let X1 , ..., Xn denote a random sample from a population X ∼ F . The
ecdf is the following statistic,
n
1X
F̂n (t) = I(−∞,t] (Xi ) , t ∈ (−∞, ∞).
n
i=1
For each t, this is a random variable!
For a given sample, the ecdf is the cdf of that distribution that puts
probability mass 1/n at each data point Xi in the sample.1
The pmf/discrete pdf corresponds to the discrete uniform on the set
{x1 , . . . , xn }.
1
So there may be some (other) population out there having exactly this distribution.
The empirical cdf
Construction of the cdf
Fraction of sample with values less or equal than argument

0.20
0.15
Fn(x)
0.10
0.05
0.00
−2.0 −1.5 −1.0 −0.5 0.0

The empirical cdf
Sampling properties
To interpret the observed ecdf (which is a sample outcome), we need to
know the sampling distribution of the corresponding estimator.
Theorem (2.1)
Let X ∼ F and {X1 , . . . , Xn } be an iid sample from the population X.
The pdf of F̂n (t) is
n j n−j
(
j j [F (t)] [1 − F (t)] for j ∈ {0, 1, 2, . . . , n} ,
P F̂n (t) = =
n 0 otherwise.
Then, at each fixed value of x,

F (t)(1 − F (t))
E F̂n (t) = F (t) and Var F̂n (t) = .
n
The empirical cdf
Uniform convergence
It is easily shown that the ecdf F̂n (t) converges in probability to the cdf
F (t) for each value of t. But there’s more...
Theorem (2.2 (Glivenko-Cantelli Theorem))

Let Dn = sup−∞<t<∞ F̂n (t) − F (t) . Then,

P lim Dn = 0 = 1.
n→∞
For large enough n, the ecdf provides a good approximation of the cdf
over its entire domain (not only for individual points).
Provided that X is a continuous random variable, one may even show that
the distribution of Dn does not depend on the true F .

The empirical cdf
Uniform convergence
Empirical vs true cdf

1.0
0.8
0.6
Fn(x)
0.4
0.2
0.0
−3 −2 −1 0 1 2 3

The empirical cdf
Recall
Definition
A statistical functional τ (F ) is any function of F .
Simplest examples:
xF 0 (x)dx = xdF (x),
R R
the mean, µ =
the variance σ 2 = (x − µ)2 dF (x), or
R
the quantiles qp = F −1 (p) (special case: the median, q1/2 ).2
This gives us ideas...
2
This is assuming uniqueness; otherwise use qp = inf{x : F (x) ≥ p}.
The empirical cdf
The plug-in principle
Definition
The plug-in estimator of θ = τ (F ) is defined by

θ̂ = τ F̂n .
Pn
Sample moments Mr0 = xr dF̂n (x) = 1 r
R
n i=1 Xi ,
Sample quantiles q̂p = F̂ −1 (p).
F̂n is not invertible (not even if F is!),
... so we take q̂p = inf{x : F̂n (x) ≥ p}
This amounts to the rth smallest observation, where r = bnp + 0.5c
where b·c denotes the integer part.3
A sample pdf however is meaningless when F is differentiable! See
Advanced Statistics III for nonparametric pdf estimation.
3
And therefore bx + 0.5c rounds to the integer nearest to x.
Plug-in I: sample moments
Outline
1 The empirical cdf
4 Up next

Sample counterparts of the population moments
Definition (Sample Moments)

Let X1 , ..., Xn denote a random sample. Then the rth order non-central
sample moment (or moment about the origin) is
n
1X r
Mr0 = Xi .
n
i=1
The rth order central sample moment (or moment about the mean) is
n
1X
Mr = (Xi − X̄n )r ,
n
i=1
1 Pn
where X̄n = n i=1 Xi .
(Realizations of Mr0 and Mr are denoted by m0r and mr .)

Sampling properties
Let Mr0 = n1 ni=1 Xir be the rth order non-central sample moment for a
P
random sample (X1 , ..., Xn ). Assume that µ02r is finite, so

For the mean of Mr0 we obtain
n
1X
E(Mr0 ) = E(Xir ) = E(Xir ) = µ0r .
n
i=1
(Thus Mr0 provides unbiased estimates for the value of µ0r .)
For the variance of Mr0 we obtain

n
1 X 1 1h 0 i
Var(Mr0 ) = Var(Xi
r
) = Var(Xi
r
) = µ − (µ 0 2
) .
n2 n n 2r r
i=1
This implies that the variance goes to zero as n → ∞.

Asymptotics
Since E(Mr0 ) = µ0r ∀n, and limn→∞ Var(Mr0 ) = 0, we have
m
Mr0 → µ0r ⇒ plim Mr0 = µ0r .
(Thus Mr0 provides consistent estimates for the value of µ0r .)

Since Mr0 = n1 i Xir is the of iid variables with mean E(Xir ) = µ0r
P
r
0 average
0 2
and variance Var(Xi ) = µ2r − (µr ) , we can use the CLT of
Lindeberg-Lévy to obtain
√ 1 P r
n n i Xi − µ0r d
p → N (0, 1).
µ02r − (µ0r )2
Hence, an asymptotic distribution of Mr0 is

a
Mr0 = n1 i Xir ∼ N µ0r , n1 [µ02r − (µ0r )2 ] .
P

Special cases
Definition (Sample Mean)
Let X1 , ..., Xn denote a random sample. The sample mean is
n
1X
X̄n = Xi = M10 .
n
i=1
From the discussion of the properties of sample moments, we know that

σ2

1 0 a 1 2
E(X̄n ) = µ, Var(X̄n ) = (µ2 −µ2 ) = , plim X̄n = µ, X̄n ∼ N µ, σ .
n n n
Definition (Sample Variance)

Let X1 , P
..., Xn denote a random sample. The sample variance is
Sn = n ni=1 (Xi − X̄n )2 = M2 .
2 1

More on the sample variance
Theorem (2.3)
Let Sn2 be the sample variance of a random sample X1 , ..., Xn from a
population distribution. Assuming that the population moments exist,
(n−1) 2
a. E(Sn2 ) = n σ ,
2
Var(Sn2 ) = n1 n−1 µ4 − (n−1)(n−3) σ4 ,

b. n n2
c. plim Sn2 = σ 2 ,
√ d
n Sn2 − σ 2 → N 0, µ4 − σ 4 ,

d.
a
Sn2 ∼ N σ 2 , n1 (µ4 − σ 4 ) .

e.
One can say more if F is known; e.g. X̄n ∼ N µ, n1 σ 2 , nSn2 /σ 2 ∼ χ2n−1 ,

and X̄n and Sn2 are independent when Xi ∼ N (µ, σ 2 ).

Sample Covariance
For random samples with multivariate variables, the joint sample

moments between pairs of variables become relevant.
Definition (Sample Covariance)

Let (X1 , Y1 ), ..., (Xn , Yn ) denote a random sample. Then the sample
covariance is
n n
1X 1X
SXY = (Xi − X̄n )(Yi − Ȳn ) = Xi Yi − X̄n Ȳn .
n n
i=1 i=1
This is the sample counterpart of the population covariance, of course.

The sample correlation follows accordingly.

Sampling properties of SXY in short

Let SXY be the sample covariance for an iid sample (Xi , Yi ). Then,
E(SXY ) = n1 ni=1 E[(Xi − X̄n )(Yi − Ȳn )] = ( n−1
P
n )σXY .
The variance of SXY has the form
Var(SXY ) = n1 [µ2,2 − (µ01,1 )2 ] + o( n1 ).
This result is obtained from a Taylor series expansion (our best

friend).
Since E(SXY ) → σXY and limn→∞ Var(SXY ) = 0, we have
m
SXY → σXY ⇒ plim SXY = σXY .
An asymptotic approximation of the distribution of the sample

covariance is
a
1
SXY ∼ N σXY , µ2,2 − (µ01,1 )2 .
n
Plug-in II: sample quantiles and order statistics
Outline
1 The empirical cdf
4 Up next

Order statistics
We may be interested in the largest or smallest value in a random sample

rather than in the average value. Say...
The highest tide water level vs. the average;
Smallest portfolio return vs. the average.
The largest and smallest value in a sample are examples of order statistics.
Definition
Let X1 , X2 , ..., Xn be a random sample. Then X[1] ≤ X[2] ≤ ... ≤ X[n] ,
where the X[i] s are the Xi s arranged in order of increasing magnitudes, are
the order statistics of the sample; X[i] is called the ith order statistic.
Sample quantiles are also order statistics: e.g. the median is X[(n+1)/2] .4
4
Beware the multiple definitions in the literature.
Stock returns...
Example
Let the rv X be the return of a portfolio of risky assets. Then the 1st
order statistic X[1] = min{X1 , ..., Xn } is a critical variable for a risk
manager. He or she might be interested in the probability
P(X[1] ≤ -10%).
Note that we need the sampling distribution of X[1] in order to compute

this probability.

Worst-case scenario
4
2
0
y
−2
−4
0 20 40 60 80 100
Time

Sample maximum, standard normal vs. t(5) distribution

Standard normal cdf and pdf Distribution of sample max for 50 sample elements
1.0
1.4
1.2
0.8
Standard normal population
1.0
0.6
0.8
0.6
0.4
0.4
0.2
0.2
0.0
0.0
−3 −2 −1 0 1 2 3 0 5 10 15
t( 5 ) cdf and pdf Distribution of sample max for 50 sample elements

1.0
1.4
1.2
0.8
1.0
t( 50 ) population
0.6
0.8
0.6
0.4
0.4
0.2
0.2
0.0
0.0
−3 −2 −1 0 1 2 3 0 5 10 15

Larger sample
Distribution of sample max for 500 sample elements Distribution of sample max for 5000 sample elements
1.4
1.4
1.2
1.2

1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 0 5 10 15
Distribution of sample max for 500 sample elements Distribution of sample max for 5000 sample elements
1.4
1.4
1.2
1.2
1.0
1.0
t( 50 ) population
t( 50 ) population
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 0 5 10 15

Distribution
Theorem (2.10)
Let (X1 , . . . , Xn ) be a random sample from a population distribution with
cdf F , and let X[k] be the kth order statistic. Then the cdf of X[k] is given
by
n
X n j n−j
FX[k] (b) = F (b) [1 − F (b)] .
j
j=k
Build differences/derivatives to get the pdfs.

Back to min and max
Corollary
The cdfs of X[1] and X[n] are given by
n
FX[1] (b) = 1 − [1 − F (b)] , and FX[n] (b) = F (b)n .
In any case, the distribution of the order statistics FX[k] (b) depends on the
particular cdf of the parent distribution F .

Up next
Outline
1 The empirical cdf
4 Up next

Up next
Coming up
On the properties of point estimators

01_ecdf_plugin

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

01_ecdf_plugin

Uploaded by

Copyright:

Available Formats

Empirical cdfs and the plugin principle

Prof. Dr. Matei Demetrescu

Statistics and Econometrics (CAU Kiel) Summer 2021 1 / 29

Empirical cdfs and the plugin principle

1 The empirical cdf

2 Plug-in I: sample moments

3 Plug-in II: sample quantiles and order statistics

Statistics and Econometrics (CAU Kiel) Summer 2021 2 / 29

1 The empirical cdf

2 Plug-in I: sample moments

3 Plug-in II: sample quantiles and order statistics

Statistics and Econometrics (CAU Kiel) Summer 2021 3 / 29

Empirical cdf True cdf

Statistics and Econometrics (CAU Kiel) Summer 2021 4 / 29

Getting more formal

Definition ((Univariate) Empirical cdf)

For each t, this is a random variable!

Construction of the cdf

Fraction of sample with values less or equal than argument

−2.0 −1.5 −1.0 −0.5 0.0

Statistics and Econometrics (CAU Kiel) Summer 2021 6 / 29

Then, at each fixed value of x,

Theorem (2.2 (Glivenko-Cantelli Theorem))

Statistics and Econometrics (CAU Kiel) Summer 2021 8 / 29

Empirical vs true cdf

Statistics and Econometrics (CAU Kiel) Summer 2021 9 / 29

the quantiles qp = F −1 (p) (special case: the median, q1/2 ).2

This gives us ideas...

The plug-in principle

1 The empirical cdf

2 Plug-in I: sample moments

3 Plug-in II: sample quantiles and order statistics

Statistics and Econometrics (CAU Kiel) Summer 2021 12 / 29

Sample counterparts of the population moments

Definition (Sample Moments)

(Realizations of Mr0 and Mr are denoted by m0r and mr .)

Statistics and Econometrics (CAU Kiel) Summer 2021 13 / 29

random sample (X1 , ..., Xn ). Assume that µ02r is finite, so

(Thus Mr0 provides unbiased estimates for the value of µ0r .)

For the variance of Mr0 we obtain

This implies that the variance goes to zero as n → ∞.

Statistics and Econometrics (CAU Kiel) Summer 2021 14 / 29

Since E(Mr0 ) = µ0r ∀n, and limn→∞ Var(Mr0 ) = 0, we have

(Thus Mr0 provides consistent estimates for the value of µ0r .)

Hence, an asymptotic distribution of Mr0 is

Statistics and Econometrics (CAU Kiel) Summer 2021 15 / 29

From the discussion of the properties of sample moments, we know that

Definition (Sample Variance)

Statistics and Econometrics (CAU Kiel) Summer 2021 16 / 29

More on the sample variance

One can say more if F is known; e.g. X̄n ∼ N µ, n1 σ 2 , nSn2 /σ 2 ∼ χ2n−1 ,

and X̄n and Sn2 are independent when Xi ∼ N (µ, σ 2 ).

Statistics and Econometrics (CAU Kiel) Summer 2021 17 / 29

For random samples with multivariate variables, the joint sample

Definition (Sample Covariance)

This is the sample counterpart of the population covariance, of course.

Statistics and Econometrics (CAU Kiel) Summer 2021 18 / 29

Sampling properties of SXY in short

Var(SXY ) = n1 [µ2,2 − (µ01,1 )2 ] + o( n1 ).

This result is obtained from a Taylor series expansion (our best

An asymptotic approximation of the distribution of the sample

1 The empirical cdf

2 Plug-in I: sample moments

3 Plug-in II: sample quantiles and order statistics