chap3

3.
The Statistical Setup
Ismaı̈la Ba
ismaila.ba@umanitoba.ca
STAT 3100 - Winter 2024
1 / 25
Course Outline
1 Introduction
2 Some Additional Results
2 / 25
Introduction
Contents
1 Introduction
3 / 25
Introduction
θ : parameter (in general unknown).

x = (x1 , . . . , xn ) are realizations of random variables X = (X1 , . . . , Xn )
from a population of interest.
We make assumptions about the distributions of X1 , . . . , Xn (for
instance fX (.; θ)) in order to make inferences about the population
characteristics (mean, standard deviation, etc.)
iid
For example, Xi ∼ N(µ, σ2 ) random variables, where θ = (µ, σ2 ) is
unknown and need to be estimated.
Sometimes, we observe xi , . . . , xn , where xi = (xi1 , . . . , xip )′ . We
assume that these xi are realizations of multivariate random variables
Xi = (Xi1 , . . . , Xip )′ with some multivariate distribution, such as
multivariate normal, characterized withn a mean ovector µ and
covariance matrix Σp×p . That is, Θ = µ, Σp×p .
4 / 25
Introduction
Statistic
Definition 1
A statistic, denoted T = T (X1 , . . . , Xn ), is a function of observable
random variables X1 , . . . , Xn that does not depend on unknown parameters
and that can be calculated once the random variables are observed. Note
that T is a random variable with realization T = T (x1 , . . . , xn ).
Example 1
Let θ (unknown) be a population parameter.
We conduct an experiment, observations are x1 , x2 , . . . , xn .
Based on the experiment results x1 , x2 , . . . , xn , define θ̂ = f (x1 , x2 , . . . , xn ).
The statistic θ̂ = f (X1 , X2 , . . . , Xn ) is a random variable, varies from
sample to sample and informs the users about the distribution of θ̂.
θ̂ = X̄ , S, ...
5 / 25
Introduction
Statistic
Example 2
Let X1 , . . . , Xn be a random sample from a population with some

distribution function F . The sample mean is defined as
n
1X
X̄ = Xi .
n i=1
This is a random variable. Once X1 , . . . , Xn have been observed and we

have x1 , . . . , xn , the observed sample mean is x̄ = n1 ni=1 xi . If E(Xi ) = µ
P
and V(Xi ) = σ2 , then we have
 n  n
 1 X  1 X nµ
E(X̄ ) =E  Xi  = E(Xi ) = = µ,
n i=1 n i=1 n
 n  n
 1 X  ind 1 X nσ2 σ2
V(X̄ ) =V  
 Xi  = 2

 V(Xi ) = 2 = .
n i=1 n i=1 n n 6 / 25
Introduction
Statistic
Remark : If T = (X1 + . . . + Xn ) × σ2 , where σ2 is the variance of the Xi ,
then T is not observable (because σ2 is unknown), even though the Xi are.
Example 3
The sample variance is defined as
n
 n 
1 X 1 X 2 
2
S = 2
(Xi − X̄ ) =  Xi − nX̄  ,
2
n > 1.
n − 1 i=1 n − 1 i=1
Since E(Xi2 ) = V(Xi ) + E2 (Xi ) = σ2 + µ2 and E(X̄ 2 ) = µ2 + σ2 /n, we have

 n 
1 X  1
E(S 2 ) = E  Xi2 − nX̄ 2  = (n(σ2 + µ2 ) − n(µ2 + σ2 /n)) = σ2 .
n − 1 i=1 n−1
1 n − 3 4
V(S 2 ) = µ4 − σ , n > 1.( )
n n−1
7 / 25
Introduction
Sampling distribution
Recall that in the previous slide, µ4 = E[(Xi − µ)4 ] is the fourth central
moment of X .
Definition 2
Suppose that we draw all possible samples of size n from a given
population.
Suppose further that we compute a statistic (e.g., a mean, standard
deviation) for each sample.
A sampling distribution is the probability distribution of this statistic.
8 / 25
Introduction
Estimator
Definition 3
An estimator of a population parameter θ is a statistic that is
thought to produce values close to θ in some sense and is denoted by
θ̂(X). That is, a statistic that is used to estimate θ.
The estimated value of θ̂(X) is also called estimate and denoted by
θ̂(x).
Remark : The estimator is a random variable and the estimate is a

constant !
9 / 25
Introduction
Estimator
Example 4
1 Let X1 , . . . , Xn be n random variables with mean θ = µ, an estimator
of µ is for example given by
n
1X
µ̂(X) = Xi
n i=1
2 If for example, we observe x = (2, 1.4, 4.2, 5.6) then
1
µ̂(x) = (2 + 1.4 + 4.2 + 5.6) = 3.3
4
is the estimate of µ based on x.
10 / 25
Introduction
Properties of an estimator
Definition 4
Let X1 , . . . , Xn be a random sample from some population with a
parameter θ and suppose that θ̂ = θ̂(X1 , . . . , Xn ) is an estimator of θ. The
biais of θ̂ is defined to be
Biais(θ̂) = B(θ̂) = E(θ̂) − θ.
If B(θ̂) = 0, we say that θ̂ is unbiased for θ. The mean squared error of

θ̂ is defined to be
h i
MSE(θ̂) = E (θ̂ − θ)2 = V(θ̂) + B2 (θ̂).
Remark : It is clear that, if B(θ̂) = 0 then MSE(θ̂) = V(θ̂).

11 / 25
Introduction
Properties of an estimator
We assume that we have a sequence of estimators, say θ̂1 , θ̂2 , . . ., which

usually represent a sequence of estimators of θ with increasing sample size
so that θ̂n is based on a sample of size n.
We say that the sequence {θ̂n } is asymptotically unbiased for θ if

limn→∞ E(θ̂n ) = θ.
We say that the sequence {θ̂n } is consistent (or weakly consistent) for θ
if, for every ϵ > 0, limn→∞ P(|θ̂n − θ| > ϵ) = 0.
In a probability class, you would say that θ̂n converges to θ in probability.
In statistics classes, we say that θ̂n is consistent for θ. This says that, for
large n, the distribution of θ̂n is concentrated around θ.
12 / 25
Introduction
Markov inequality
If X is a random variable and u(x) is a non-negative real-valued function

then, for any real c > 0,
E[u(X )]
P(u(X ) ≥ c) ≤ .
c
In particular,
E[|X |]
P(|X | ≥ c) ≤ ,
c
which is the Markov inequality.
13 / 25
Introduction
Chebyshev’s inequality
If we let µ = E(X ), σ2 = V(X ) < ∞ and take u(x) = |x − µ|, we obtain
E[|X − µ|2 ] σ2
P(|X − µ| ≥ c) = P(|X − µ|2 ≥ c 2 ) ≤ = .
c2 c2
Taking c = kσ yields Chebyshev’s inequality,
E[|X − µ|2 ] 1
P(|X − µ| ≥ kσ) = P(|X − µ|2 ≥ k 2 σ2 ) ≤ = 2.
k 2 σ2 k
For example, the probability that X is more than k = 2 standard deviations

from µ is bounded above by 212 = 14 .
14 / 25
Introduction
More on inequalities
Let {θ̂n } be a sequence of estimators of θ. We have

h i
E |θ̂n − θ|2 MSE(θ̂n )
P(|θ̂n − θ| ≥ c) ≤ = .
c2 c2
Thus, if MSE(θ̂n ) → 0 as n → ∞, then θ̂n is consistent for θ. This is

commonly referred to as mean squared error consistency.
If a sequence of estimators is MSE consistent, then it is consistent.
It is not true that all consistent estimators are MSE consistent.
15 / 25
Introduction
Example
Example 5
Going back to Example 2, if V(Xi ) < σ2 , we have that
E[X̄n ] = E[(X1 + . . . + Xn )/n] = µ and V(X̄n ) = σ2 /n.

σ2
Thus, X̄n is unbiased and MSE(X̄n ) = V(X̄n ) = n . Hence,
σ2
P(|X̄n − µ| ≥ ϵ) ≤ → 0 as n → ∞.
nϵ 2
Since this is true for all ϵ > 0, we have that X̄n is consistent for µ.
16 / 25
Introduction
Exercises
Exercise 1 ( )
For a random sample X1 , . . . , Xn such that E(Xi4 ) < ∞, show that S 2 is
consistent for σ2 .
Exercise 2 ( )
Suppose that X1 , X2 , . . . are iid Exp(β) random variables and define
n n
1X 1 X
X̄n = Xi and Sn2 = (Xi − X̄n )2 .
2 i=1 n − 1 i=1
1 Show that X̄n is unbiased and consistent for β and that Sn2 is unbiased
and consistent for β2 .
2 Since X̄n is unbiased for β, we could consider using X̄n2 as an estimator
of β2 instead of Sn2 . Show that X̄n2 is asymptotically unbiased for β2 .
What is the bias ? 17 / 25
Some Additional Results
Contents
1 Introduction
18 / 25
Convergence in Probability to a Constant
Let {Yn }∞
n=1 be a sequence of random variables, then we say that Yn
converges to a constant c in probability if
lim P(|Yn − c| > ϵ) = 0, for all ϵ > 0.

n→∞
Remark
P
To say that Yn converges to c in probability, we write Yn → c.
If Yn ’s are all estimators of c, we will say that {Yn } is consistent for c.
Example 6
P
Let Yn ∼ Exp(1/n), show that Yn → 0. We have
lim P(|Yn − 0| > ϵ) = lim P(Yn > ϵ) = lim e −nϵ = 0, for all ϵ > 0.
n→∞ n→∞ n→∞
19 / 25
Convergence in Probability to a Constant
Proposition
Let a, b, c, d be real constants. Let {Xn } and {Yn } be sequences of random
P P
variables such that Xn → c and Yn → d. Then,
P
1 aXn + bYn → ac + bd.
P
2 Xn Yn → cd.
P
3 If g (·) is continuous at c, then g (Xn ) → g (c).
Example 7
P
Let Xn → c ∈ R and let g (·) be a real valued function, continuous at c.
P
1 Provided c , 0, 1/Xn → 1/c (g (x) = 1/x).
√ P √ √
2 Provided c > 0, Xn → c (g (x) = x).
20 / 25
Law of Large Numbers (LLL)

LLL (Khinchin’s version)
If X1 , X2 , . . . are iid random variables such that µ = E(Xi ) < ∞, then
lim P(|X̄n − µ| > ϵ) = 0, for all ϵ > 0.

n→∞
P
That is, X̄n is consistent for µ or X̄n → µ.
Let Yi = Xik for some k ∈ N and all i ∈ N, and define

Sn = Y1 + Y2 + . . . + Yn such that E(Yi ) < ∞. Then
n
1X k
X is consistent for E(Xik ) = µ′k .
n i=1 i
Remark : For a random variable X , if E(X k ) < ∞ then E(X j ) < ∞ (and so
E[(X − µ)j ] < ∞) for j = 1, . . . , k.
21 / 25
We define the sample moments for a random sample {Xi }ni=1 as

follows
n
1X k
Mk = X k = 1, 2, . . .
n i=1 i
Since n and the Xi are finite (observable), Mk will always exist.
When µ′k exists, Mk are consistent for µ′k .
Note that we have lost the sample size in this notation.
22 / 25
Convergence in Distribution
A sequence of random variables X1 , X2 , . . . with respective CDF’s FXi

d
converges to X (with CDF FX ) in distribution, written Xn → X , if
lim FXn (t) = FX (t)

n→∞
at all t for which F is continuous.
Remark
Convergence in distribution is weaker than convergence in
probability.
23 / 25
Slutsky’s Theorem
Proposition
Let {Xn }∞ ∞
n=1 and {Yn }n=1 be sequences of random variables such that
d P
Xn → X and Yn → c, where X is a random variable and c is a constant.
Then
d
1 Xn + Yn → X + c.
d
2 Xn Yn → cX .
d
3 Provided c , 0, Xn /Yn → X /c.
24 / 25
Univariate Delta Method
Proposition
Let {Xn }∞n=1 be a sequence of random variables. Let θ, σ ∈ R be constants
2
with σ2 > 0, and let g (·) be a real valued function such that g ′ (θ) , 0
exists. If
√ d
n(Xn − θ) → N(0, σ2 ),
then
√ d
n(g (Xn ) − g (θ)) → N(0, σ2 [g ′ (θ)]2 ).
25 / 25

chap3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

chap3

Uploaded by

Copyright:

Available Formats

3.

The Statistical Setup

2 Some Additional Results

2 Some Additional Results

θ : parameter (in general unknown).

Let X1 , . . . , Xn be a random sample from a population with some

This is a random variable. Once X1 , . . . , Xn have been observed and we

Since E(Xi2 ) = V(Xi ) + E2 (Xi ) = σ2 + µ2 and E(X̄ 2 ) = µ2 + σ2 /n, we have

Remark : The estimator is a random variable and the estimate is a

2 If for example, we observe x = (2, 1.4, 4.2, 5.6) then

Biais(θ̂) = B(θ̂) = E(θ̂) − θ.

If B(θ̂) = 0, we say that θ̂ is unbiased for θ. The mean squared error of

Remark : It is clear that, if B(θ̂) = 0 then MSE(θ̂) = V(θ̂).

We assume that we have a sequence of estimators, say θ̂1 , θ̂2 , . . ., which

We say that the sequence {θ̂n } is asymptotically unbiased for θ if

If X is a random variable and u(x) is a non-negative real-valued function

If we let µ = E(X ), σ2 = V(X ) < ∞ and take u(x) = |x − µ|, we obtain

For example, the probability that X is more than k = 2 standard deviations

Let {θ̂n } be a sequence of estimators of θ. We have

Thus, if MSE(θ̂n ) → 0 as n → ∞, then θ̂n is consistent for θ. This is

E[X̄n ] = E[(X1 + . . . + Xn )/n] = µ and V(X̄n ) = σ2 /n.

2 Some Additional Results

Convergence in Probability to a Constant

lim P(|Yn − c| > ϵ) = 0, for all ϵ > 0.

Convergence in Probability to a Constant

Law of Large Numbers (LLL)

lim P(|X̄n − µ| > ϵ) = 0, for all ϵ > 0.

Let Yi = Xik for some k ∈ N and all i ∈ N, and define

We define the sample moments for a random sample {Xi }ni=1 as

Since n and the Xi are finite (observable), Mk will always exist.

When µ′k exists, Mk are consistent for µ′k .

Note that we have lost the sample size in this notation.

A sequence of random variables X1 , X2 , . . . with respective CDF’s FXi

lim FXn (t) = FX (t)

at all t for which F is continuous.

Univariate Delta Method

You might also like