I N I 1 I

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Example. Suppose PX1 , . . . , Xn is a random sample (i.i.d.

) from Poisson(λ),
λ > 0. Is T (X) = ni=1 Xi sufficient for λ? Note that
n Pn
Y λxi λ i=1 xi
f (x1 , . . . , xn |λ) = exp(−λ) = exp(−nλ) n Q
i=1
xi ! i=1 xi !
 Pn  1
= exp(−nλ)λ i=1 xi Qn .
i=1 xi !

Take g(t, λ) = exp(−nλ)λt and h(x1 , . . . , xn ) = Qn 1 to satisfy the factor-


i=1 xi !
ization theorem.
Example. Let X1 , . . . , Xn be a random sample (i.i.d.) from N (µ, σ 2 ). What
are jointly sufficient for µ and σ 2 ? We have,
n  
2
Y 1 1 2
f (x1 , . . . , xn |µ, σ ) = √ exp − 2 (xi − µ)
i=1
2πσ 2σ
n
!
1 X
= (2π)−n/2 σ −n exp − 2 (xi − µ)2
2σ i=1
( n n
)!
1 X X
= (2π)−n/2 σ −n exp − 2 x2i + nµ2 − 2µ xi
2σ i=1
n n
! i=1
2
nµ 1 X µ X
= σ −n exp − 2 − 2 x2i + 2 xi (2π)−n/2 .
2σ 2σ i=1 σ i=1

Note that T (X1 , . . . , Xn ) = (  ni=1 Xi , ni=1 Xi2 ) is sufficient


P P
 since we can take
2 nµ2 1 µ
g((t1 , t2 ), (µ, σ )) = σ exp − 2σ2 − 2σ2 t2 + σ2 t1 and h(x) = (2π)−n/2 .
−n

Also, the map ( ni=1 Xi , ni=1 Xi2 ) → (X̄, ni=1 (Xi − X̄)2 ) is one-one, so
P P P
(X̄, S 2 ) is another set of sufficient statistics.
Example. Let X1 , . . . , Xn be a random sample from the population with
density, fP (x|θ) = 21 exp(−|x − θ|), −∞ < θ < ∞. Then f (x1 , . . . , xn |θ) =
1
2n
exp(− ni=1 |xi − θ|). What is sufficient for θ? Not much reduction of data
is possible here, except forP noting that the joint density can be written as
f (x1 , . . . , xn |θ) = 2n exp(− ni=1 |x(i) − θ|), where x(1) < x(2) < · · · < x(n) are
1

the ordered observations. Therefore T (X1 , . . . , Xn ) = (X(1) , X(2) , . . . , X(n) )


is sufficient. Order statistics provide data reduction whenever data is a ran-
dom sample from a continuous distribution. Some models permit further
reduction.
Interpretation of sufficiency. Observing X = (X1 , . . . , Xn ) is equivalent
to observing T (X) as far as information on θ is concerned. Given T (X) (the

1
distribution of which depends on θ), one can generate X′ = (X1′ , . . . , Xn′ )
from P (X|T ) (which does not require θ, being uninformative). Then the
probability distributions of X and X′ are the same. Note that if two random
quantities have the same probability distribution then they contain the same
amount of information.
Pn
Example. X1 , . . . , Xn i.i.d Poisson(λ). Then S = i=1 Xi is sufficient.
If S = s is given from Poisson(nλ), then simply generate (X1′ , . . . , Xn′ ) from
Multinomial(s; n1 , . . . , n1 ). It is clear that the joint distribution of (X1′ , . . . , Xn′ )
is the same as that of (X1 , . . . , Xn ), which is i.i.d Poisson(λ).
Definition. Two statistics S1 and S2 are said to be equivalent if S1 (x) =
S1 (y) iff S2 (x) = S2 (y).
Note that, if S1 and S2 are equivalent, then
(i) they give the same partition of the sample space,
(ii) they provide the same reduction,
(iii) they provide the same information.
. , Xn ) = Pni=1 Xi . S1 (x1 , . . . , xn ) =
P
Example. S1 (X1 ,P . . . , Xn ) = X̄, S2 (X1 , . .P
S1 (y1 , . . . , yn ) iff n1 ni=1 xi = n1 ni=1 yi iff ni=1 xi = ni=1 yi iff S2 (x1 , . . . , xn ) =
P
S2 (y1 , . . . , yn ).
Data is a realization of the random observable X. It is a point in the sam-
ple space. The values of X form the finest partition of the sample space.
Any statistic T (X) also gives a partition of the sample space. For example,
(X1 , . . . , Xn ) are points in Rn . T1 (X1 , . . . , Xn ) = (X(1) , X(2) , . . . , X(n) ) par-
titions Rn into sets where the points are Pnpermutations of each other. This is
n 1 n
coarser than R . T2 (X1 , . . . , Xn ) = n i=1 Xi partitions R into sets where
the points have the same average. This partition is coarser than the one
provided by T1 since permutations do not change the average.

In the figure above, the dashed, red partition is coarser than the blue parti-

2
tion.
Sufficient statistics gives a partition (of the sample space) which retains all
the information about the parameters. Therefore, maximum possible reduc-
tion of data, or the coarsest possible partition of the sample space is desirable.
How does one choose sufficient statistics?
Example. X1 , . . . , Xn i.i.d N (0, σ 2 ). Then
n
!
−n/2 −n 1 X 2
f (x1 , . . . , xn |λ) = (2π) σ exp − 2 x .
2σ i=1 i

Note from this joint density that


(i) T1 (X1 , . . . , Xn ) = (X1 , . . . , Xn ) is sufficient for σ 2 ;
(ii) T2 (X1 , . . . , Xn ) = (X12 , . . . , Xn2 ) is sufficient for σ 2 ;
(iii) T3 (X1 , . . . , Xn ) = (X12 + · · · + Xm 2 2
, Xm+1 + · · · + Xn2 ) is sufficient for σ 2 ;
2 2
(iv) T4 (X1 , . . . , Xn ) = (X1 + · · · + Xn ) is sufficient for σ 2 .
Observe that T4 = g1 (T3 ) = g1 (g2 (T2 )) = g1 (g2 (g3 (T1 ))). i.e., if T is sufficient
and T = H(U ), then U is also sufficient. Knowledge of U implies knowledge
of T and hence permits reconstruction of the original data. T provides greater
reduction or coarser partition unless H is one-one.
Minimal sufficiency. T = T (X) is minimal sufficient if it provides the
maximal amount of data reduction. i.e., for any sufficient statistics U =
U (X), there exists a function H such that T = H(U ).

You might also like