04_sufficiency_mvue

Sufficient statistics and MVUEs
Advanced Statistics II
Prof. Dr. Matei Demetrescu
Statistics and Econometrics (CAU Kiel) Summer 2021 1 / 30

Focus on the essential
For an identified model, the sample contains all relevant information.

Functions of the sample are supposed to summarize that information.
Some statistics may in principle be more informative than others (for
given parameters),
In fact some statistics may be totally uninformative,
e.g. the sample variance is shift-invariant and not informative of the
mean.
Thus, in place of the original random-sample outcome, it may be sufficient
to have the outcomes of selected statistics to estimate any q(θ).
(MVU) Estimation should then be based on such sufficient statistics!

Today’s outline
Sufficient statistics and MVUEs
1 Sufficient statistics
2 The Blackwell-Rao theorem
3 Completeness and MVUEs
4 Up next

Sufficient statistics
Outline
4 Up next

Formally
Definition
Let (X1 , . . . , Xn ) ∼ fX (x1 , . . . , xn ; θ) be a random sample, and let
S1 = s1 (X1 , . . . , Xn ), . . . , Sr = sr (X1 , . . . , Xn ) be r statistics. The r
statistics are said to be sufficient statistics for fX (x; θ) iff
fX (x1 , . . . , xn ; θ | s1 , . . . , sr ) = h (x1 , . . . , xn ) ,
i.e., the conditional density of X , given S = [S1 , . . . , Sr ]0 , does not

depend on the parameter θ.
What remains after knowing the sufficient statistics is unidentified!

Therefore, estimation should ignore that.

A necessary and sufficient criterion
Theorem (Neyman’s Factorization Theorem)

Let fX (x; θ) be the pdf of the random sample (X1 , . . . , Xn ). The
statistics S1 , . . . , Sr are sufficient statistics for fX (x; θ) iff fX (x; θ) can
be decomposed as
fX (x; θ) = g (s1 (x), . . . , sr (x); θ) · h(x),
where g is a function of only s1 (x), . . . , sr (x) and θ, and h(x) does not
depend on θ.
The proof is in a sense elementary,

and the intuition is that conditioning on Sj only leaves randomness
... which is not identifying!

An example
Let (X1 , . . . , Xn ) be a random sample from a Bernoulli population with pdf
f (x; p) = px (1 − p)1−x I{0,1} (x), p ∈ [0, 1].
Note that the joint pdf of the random sample is given by
Qn
fX (x1 , ..., xn ; p) = pxi (1 − p)1−xi · I{0,1} (xi )
i=1
Pn Pn Qn
= p i=1 xi (1 − p)n− i=1 xi · I{0,1} (xi ) .
| {z } | i=1 {z }
Pn
setting S = i=1 Xi , this independent from p and
term corresponds to g(s(x); p) corresponds to h(x)
It follows that the value of the sum of the sample outcomes, being sufficient
(Neyman!) contains all the sample information relevant for estimating q(p).
Suppose that n = 3 and that we observe s = 2. It does not matter how:
x = (1, 1, 0), x = (1, 0, 1), x = (0, 1, 1).

Example
Let (X1 , . . . , Xn ) be a random N (µ, σ 2 ) sample with θ = (µ, σ 2 )0 and joint pdf
of the random sample
Qn 1 2
fX (x1 , ..., xn ; θ) = i=1
√ 1
2πσ 2
e− 2σ2 (xi −µ)
1
Pn 2
= 1
(2πσ 2 )n/2
e− 2σ2 i=1 (xi −µ)
1
Pn 2 Pn 2
= 1
(σ 2 )n/2
e− 2σ2 ( i=1 xi −2µ i=1 xi +nµ ) · 1
(2π)n/2
.
| {z } | {z }
Pn Pn
setting S1 = i=1 Xi and S2 = i=1 Xi2 , this independent from θ and
term corresponds to g(s1 (x), s2 (x); θ) corresponds to h(x)
Pn Pn
So S1 = i=1 Xi and S2 = i=1 Xi2 are sufficient statistics for θ = (µ, σ 2 )0 .
Pn Pn
The sample mean X̄n = n1 i=1 Xi and variance S 2 = n−1 1 2
i=1 (Xi − X̄n ) are
invertible functions of S1 and S2 . They provide the same information about θ.

How much information compressing?
Definition
A sufficient statistic S = s(X ) for fX (x; θ) is said to be a minimal
sufficient statistic if, for every other sufficient statistic T = t(X ), ∃ a
functiona
hT (·) such that s(x) = hT (t(x)) ∀ x ∈ RΩ (X ).

a
The notation for the sample space RΩ (X ) indicates that the range of X is
taken over all θs in the parameter space Ω. If the support of the pdf does not
change with θ (e.g., Normal, Gamma, etc.) then RΩ (X ) = R(X ).
Intuition:
By definition, a function can never have more elements in its range
than in its domain.
So t may simplify to s, but not the other way round.
Lehmann-Scheffé’s Minimal Sufficiency Theorem

The general result depends on how the sample space depends on the
parameter, so we look at a corollary only.1
Corollary
Let X ∼ fX (x; θ), and suppose that R(X ) does not depend on θ. If the
statistic S = s(X ) is such that
fX (x; θ)
does not depend on θ iff (x, y) satisfies s(x) = s(y),
fX (y; θ)
then S = s(X ) is a minimal sufficient statistic.
Proof: See Mittelhammer (1996), p. 396.

We need to find an appropriate function S = s(X );
... but this is often less complicated than it sounds.
1
See Mittelhammer (1996, p. 395-396) for the full result.
Minimal sufficient statistics in the exponential class
Theorem
Let fX (x; θ) be a member of the exponential class of density functions,
hP i
k
fX (x; θ) = exp i=1 ci (θ)gi (x) + d(θ) + z(x) .
Then s(X ) = [g1 (X ), . . . , gk (X )] is a k-variate sufficient statistic, and if

c1 (θ),...,ck (θ), are linearly independent, the sufficient statistic is a minimal
sufficient statistic.
The result is established by using Neyman’s factorization theorem,

followed by the Corollary.

Example
Let (X1 , . . . , Xn ) be a random sample from a Gamma population with a joint pdf
which belongs to the exponential class
Qn 1 α−1 −xi /β
fX (x; α, β) = i=1 β α Γ(α) xi e
h Pn Pn i
= exp (α − 1) i=1 ln xi − β1 x i −n ln[β α
Γ(α)] .
| {z } | {z } |{z} | i=1
{z }
c1 (θ) g1 (x) c2 (θ) g2 (x)
Pn Pn
Thus, it follows that [g1 (X ) , g2 (X )] = [ i=1 ln Xi , i=1 Xi ] is a bivariate
minimal sufficient statistic for (α, β).

Sufficiency and minimality

Sufficient statistics are not unique:
one-to-one transformations of a (minimal) sufficient statistic provide
the same sample information about the unknown parameter as the
initial statistic.
Theorem
Let S = s(X ) be an r-dimensional sufficient statistic for fX (x; θ). If
τ [s(X )] is an r-dimensional invertible function of s(X ), then
a. τ [s(X )] is an r-dimensional sufficient statistic for fX (x; θ);
b. if s(X ) is a minimal sufficient statistic, then τ [s(X )] is a minimal
sufficient statistic.
a. follows with Neyman’s factorization theorem, while

b. is proved e.g. in Mittelhammer (1996), p. 405.
The Blackwell-Rao theorem
Outline
4 Up next

Main result
We focus on the scalar case.
Theorem
Let S = (S1 , ..., Sr )0 be an r-dimensional sufficient statistic for fX (x; θ),
and let T = t(X ) be any unbiased estimator for the scalar q(θ). Define
T ∗ = t∗ (X ) = E[T (X )|S1 , ..., Sr ].
Then
1. T ∗ is a statistic and it is a function of S1 , ..., Sr ;
2. E(T ∗ ) = q(θ), that is, T ∗ is an unbiased estimator of q(θ);
3. Var(T ∗ ) ≤ Var(T ) ∀ θ ∈ Ω, where the equality is attained only if
P(T ∗ = T ) = 1.
Proof: elementary manipulations.

Can’t lose
Given an unbiased estimator,

another unbiased estimator that is a function of a sufficient statistic
can be constructed,
which will not have larger variance but eventually smaller variance.
We may thus improve MSE performance of unbiased estimators.
But: If an unbiased estimator T is already a function of a sufficient

statistic S, then the Rao-Blackwellized estimator T ∗ will be identical to T .

Where to stop?
Will a Rao-Blackwellized estimator be the MVUE?
Yes! ... if the sufficient statistic S is complete.

Completeness and MVUEs
Outline
4 Up next

Complete sufficient statistics

Definition
Let S = [S1 , . . . , Sr ]0 be a sufficient statistic for fX (x; θ). The sufficient
statistic S is said to be complete iff for any statistic z(S) with
Eθ [z(S)] = 0 ∀ θ ∈ Ω,
it holds that
Pθ [z(S) = 0] = 1 ∀ θ ∈ Ω.
(This relates – vaguely – to identification.)
Lemma
If a sufficient statistic S is complete, two different functions of S cannot
have the same expected value.
So any unbiased estimator of q(θ) that is a function of a complete

sufficient statistic is unique.
The Bernoulli example I
Example
Let (X1 , . . . , Xn ) be a random sample from a Bernoulli population with
P(X = 1) = p, and consider the statistic
n
X
S= Xi ,
i=1
which is a sufficient statistic for p – is S also complete?
To determine whether S is a complete sufficient statistic we need to

show that a function z(S) of S for which
E[z(S)] = 0 ∀ p ∈ [0, 1] is characterized by P[z(S) = 0] = 1 ∀ p ∈ [0, 1].

The Bernoulli example II
First note that S ∼binomial(n, p), so that E[z(S)] = 0 implies

n
n jX
E[z(S)] = z(j) p (1 − p)n−j = 0
j
j=0
n
n
X n j
= (1 − p) z(j) ω = 0, where ω = p/(1 − p).
j
j=0
Hence, E[z(S)] = 0 ∀ p ∈ [0, 1] requires that

n
X n j
z(j) ω =0 ∀p ∈ [0, 1].
j
j=0

The Bernoulli example III
For that polynomial in ω to be equal to 0 ∀ω, all coefficients z(j) nj need

to be equal to 0, that is

n
z(j) = 0, so that z(j) = 0 ∀j ∈ {0, 1, ..., n}.
j
| {z }
6= 0
Hence, E[z(S)] = 0 ∀ p requires that z(j) = 0 ∀ j such that E[z(S)] = 0

implies that P[z(S) = 0] = 1.
Pn
Thus, S = i=1 Xi is a complete sufficient statistic for p.

Completeness in the exponential class

Theorem
Let the joint density, fX (x; θ), of the random sample (X1 , . . . , Xn ) belong
to the exponential class of densities with pdf
" k #
X
fX (x; θ) = exp ci (θ)gi (x) + d(θ) + z(x) .
i=1
If the range of [c1 (θ), . . . , ck (θ)]0 , θ ∈ Ω, contains an open k-dimensional

rectanglea , then s(X ) = [g1 (X ), . . . , gk (X )]0 is a complete sufficient
statistic for fX (x; θ), θ ∈ Ω.
a
The condition that the range of [c1 (θ), . . . , ck (θ)]0 contains an open
k-dimensional rectangle excludes cases where the ci (θ)s are linearly dependent.
For a random sample from a N (µ, σ 2 ) distribution with (µ, σ 2 ) ∈ R1 × R1+ , for
example, the range of [c1 (·), c2 (·)]0 = [ σµ2 , − 2σ1 2 ] is the set R1 × R1− and
contains an open 2-dimensional rectangle.

The Lehmann-Scheffé completeness theorem
If complete sufficient statistics exist for a statistical model

{fX (x; θ), θ ∈ Ω}, then an alternative to the CRLB approach is available
to identify the MVUE of q(θ).
Theorem
Let S = (S1 , . . . , Sr )0 be a complete sufficient statistic for f (x; θ). Let
T = t(S) be an unbiased estimator for the function q(θ). Then T = t(S)
is the MVUE of q(θ).
Proof: Use Blackwell-Rao suitably.

Summing up
There are two possible procedures for identifying the MVUE for q(θ):
Find a statistic of the form t(S) such that E(t(S)) = q(θ).
Then t(S) is necessarily the MVUE of q(θ).
Find any unbiased estimator of q(θ), say t∗ (X ).
Then t(S) = E(t∗ (X )|S) is the MVUE of q(θ).
The condition is that S be a complete sufficient statistic.

The Poisson example I

Example
Let (X1 , ..., Xn ) be a random sample from a Poisson distribution with pdf
e−θ θx
f (x; θ) = for x = 0, 1, 2, . . . , E(X) = Var(X) = θ.
x!
Find the MVUE of q(θ) = θ.
The joint pdf fX (x; θ) is a member of the exponential class of densities,

n Pn
Y e−nθ θ i=1 xi
fX (x; θ) = f (xi ; θ) = Qn
i=1 i=1 xi !
n
X n
Y

= exp ln(θ) xi −nθ − ln xi ! .
| {z }
c(θ) |i=1
{z } i=1
g(x)

The Poisson example II

Pn
Hence, the statistic g(X ) = i=1 Xi is a complete sufficient statistic for
θ.
To identify the MVUE for θ,
Pn
find a function of the complete sufficient statistic i=1 Xi
... whose expectation is θ.
Since X̄n = n1 ni=1 Xi is an unbiased
P
estimator for E(X) = θ, it is
1 Pn
the obvious choice; so X̄n = n i=1 Xi is the MVUE of θ.
But: Does the variance of the MVUE of θ, given by

n
!
1X θ
Var(X̄n ) = Var Xi = ,
n n
i=1
attain the CRLB?

The Poisson example III

The CRLB for the variance of an unbiased estimator T of q(θ) = θ is
h i2
∂q(θ)
∂θ
Var(T ) ≥ hn o2 i ,
∂
nE ∂θ ln f (X; θ)
where ln f (x; θ) = −θ + x ln(θ) − ln(x!), such that

n∂ o2 x 2 x2
ln f (x; θ) = (−1 + )2 = 1 − x + 2
∂θ θ θ θ
2
hn ∂ o2 i Var(X) + E(X) 1
E ln f (X; θ) = 1−2+ 2
= ,
∂θ θ θ
and
1 θ
1 = n.Var(T ) ≥
nθ
Thus, the variance of the MVUE X̄n = n1 ni=1 Xi of θ attains the CRLB.
P

Up next
Outline
4 Up next

Up next
Coming up
Point estimation methods: Maximum Likelihood

04_sufficiency_mvue

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

04_sufficiency_mvue

Uploaded by

Copyright:

Available Formats

Sufficient statistics and MVUEs

Prof. Dr. Matei Demetrescu

Statistics and Econometrics (CAU Kiel) Summer 2021 1 / 30

For an identified model, the sample contains all relevant information.

(MVU) Estimation should then be based on such sufficient statistics!

Statistics and Econometrics (CAU Kiel) Summer 2021 2 / 30

Sufficient statistics and MVUEs

2 The Blackwell-Rao theorem

3 Completeness and MVUEs

Statistics and Econometrics (CAU Kiel) Summer 2021 3 / 30

2 The Blackwell-Rao theorem

3 Completeness and MVUEs

Statistics and Econometrics (CAU Kiel) Summer 2021 4 / 30

i.e., the conditional density of X , given S = [S1 , . . . , Sr ]0 , does not

What remains after knowing the sufficient statistics is unidentified!

Statistics and Econometrics (CAU Kiel) Summer 2021 5 / 30

A necessary and sufficient criterion

Theorem (Neyman’s Factorization Theorem)

fX (x; θ) = g (s1 (x), . . . , sr (x); θ) · h(x),

The proof is in a sense elementary,

Statistics and Econometrics (CAU Kiel) Summer 2021 6 / 30

Note that the joint pdf of the random sample is given by

Statistics and Econometrics (CAU Kiel) Summer 2021 7 / 30

term corresponds to g(s1 (x), s2 (x); θ) corresponds to h(x)

Statistics and Econometrics (CAU Kiel) Summer 2021 8 / 30

How much information compressing?

hT (·) such that s(x) = hT (t(x)) ∀ x ∈ RΩ (X ).

Lehmann-Scheffé’s Minimal Sufficiency Theorem

then S = s(X ) is a minimal sufficient statistic.

Proof: See Mittelhammer (1996), p. 396.

Minimal sufficient statistics in the exponential class

Then s(X ) = [g1 (X ), . . . , gk (X )] is a k-variate sufficient statistic, and if

The result is established by using Neyman’s factorization theorem,

Statistics and Econometrics (CAU Kiel) Summer 2021 11 / 30

Statistics and Econometrics (CAU Kiel) Summer 2021 12 / 30

Sufficiency and minimality

a. follows with Neyman’s factorization theorem, while

2 The Blackwell-Rao theorem

3 Completeness and MVUEs

Statistics and Econometrics (CAU Kiel) Summer 2021 14 / 30

T ∗ = t∗ (X ) = E[T (X )|S1 , ..., Sr ].

Proof: elementary manipulations.

Statistics and Econometrics (CAU Kiel) Summer 2021 15 / 30

Given an unbiased estimator,

But: If an unbiased estimator T is already a function of a sufficient

Statistics and Econometrics (CAU Kiel) Summer 2021 16 / 30

Will a Rao-Blackwellized estimator be the MVUE?

Yes! ... if the sufficient statistic S is complete.

Statistics and Econometrics (CAU Kiel) Summer 2021 17 / 30

2 The Blackwell-Rao theorem

3 Completeness and MVUEs

Statistics and Econometrics (CAU Kiel) Summer 2021 18 / 30

Complete sufficient statistics

(This relates – vaguely – to identification.)

So any unbiased estimator of q(θ) that is a function of a complete

The Bernoulli example I

which is a sufficient statistic for p – is S also complete?

To determine whether S is a complete sufficient statistic we need to

E[z(S)] = 0 ∀ p ∈ [0, 1] is characterized by P[z(S) = 0] = 1 ∀ p ∈ [0, 1].

Statistics and Econometrics (CAU Kiel) Summer 2021 20 / 30

The Bernoulli example II

First note that S ∼binomial(n, p), so that E[z(S)] = 0 implies

Hence, E[z(S)] = 0 ∀ p ∈ [0, 1] requires that

Statistics and Econometrics (CAU Kiel) Summer 2021 21 / 30