Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

3.

The Statistical Setup

Ismaı̈la Ba

ismaila.ba@umanitoba.ca
STAT 3100 - Winter 2024

1 / 25
Course Outline

1 Introduction

2 Some Additional Results

2 / 25
Introduction

Contents

1 Introduction

2 Some Additional Results

3 / 25
Introduction

θ : parameter (in general unknown).


x = (x1 , . . . , xn ) are realizations of random variables X = (X1 , . . . , Xn )
from a population of interest.
We make assumptions about the distributions of X1 , . . . , Xn (for
instance fX (.; θ)) in order to make inferences about the population
characteristics (mean, standard deviation, etc.)
iid
For example, Xi ∼ N(µ, σ2 ) random variables, where θ = (µ, σ2 ) is
unknown and need to be estimated.
Sometimes, we observe xi , . . . , xn , where xi = (xi1 , . . . , xip )′ . We
assume that these xi are realizations of multivariate random variables
Xi = (Xi1 , . . . , Xip )′ with some multivariate distribution, such as
multivariate normal, characterized withn a mean ovector µ and
covariance matrix Σp×p . That is, Θ = µ, Σp×p .

4 / 25
Introduction

Statistic

Definition 1
A statistic, denoted T = T (X1 , . . . , Xn ), is a function of observable
random variables X1 , . . . , Xn that does not depend on unknown parameters
and that can be calculated once the random variables are observed. Note
that T is a random variable with realization T = T (x1 , . . . , xn ).

Example 1
Let θ (unknown) be a population parameter.
We conduct an experiment, observations are x1 , x2 , . . . , xn .
Based on the experiment results x1 , x2 , . . . , xn , define θ̂ = f (x1 , x2 , . . . , xn ).
The statistic θ̂ = f (X1 , X2 , . . . , Xn ) is a random variable, varies from
sample to sample and informs the users about the distribution of θ̂.
θ̂ = X̄ , S, ...

5 / 25
Introduction

Statistic
Example 2

Let X1 , . . . , Xn be a random sample from a population with some


distribution function F . The sample mean is defined as
n
1X
X̄ = Xi .
n i=1

This is a random variable. Once X1 , . . . , Xn have been observed and we


have x1 , . . . , xn , the observed sample mean is x̄ = n1 ni=1 xi . If E(Xi ) = µ
P
and V(Xi ) = σ2 , then we have
 n  n
 1 X  1 X nµ
E(X̄ ) =E  Xi  = E(Xi ) = = µ,
n i=1 n i=1 n
 n  n
 1 X  ind 1 X nσ2 σ2
V(X̄ ) =V  
 Xi  = 2

 V(Xi ) = 2 = .
n i=1 n i=1 n n 6 / 25
Introduction

Statistic
Remark : If T = (X1 + . . . + Xn ) × σ2 , where σ2 is the variance of the Xi ,
then T is not observable (because σ2 is unknown), even though the Xi are.
Example 3
The sample variance is defined as
n
 n 
1 X 1 X 2 
2
S = 2
(Xi − X̄ ) =  Xi − nX̄  ,
2
n > 1.
n − 1 i=1 n − 1 i=1

Since E(Xi2 ) = V(Xi ) + E2 (Xi ) = σ2 + µ2 and E(X̄ 2 ) = µ2 + σ2 /n, we have


 n 
1 X  1
E(S 2 ) = E  Xi2 − nX̄ 2  = (n(σ2 + µ2 ) − n(µ2 + σ2 /n)) = σ2 .
n − 1 i=1 n−1
1 n − 3 4
V(S 2 ) = µ4 − σ , n > 1.( )
n n−1
7 / 25
Introduction

Sampling distribution

Recall that in the previous slide, µ4 = E[(Xi − µ)4 ] is the fourth central
moment of X .

Definition 2
Suppose that we draw all possible samples of size n from a given
population.
Suppose further that we compute a statistic (e.g., a mean, standard
deviation) for each sample.
A sampling distribution is the probability distribution of this statistic.

8 / 25
Introduction

Estimator

Definition 3
An estimator of a population parameter θ is a statistic that is
thought to produce values close to θ in some sense and is denoted by
θ̂(X). That is, a statistic that is used to estimate θ.
The estimated value of θ̂(X) is also called estimate and denoted by
θ̂(x).

Remark : The estimator is a random variable and the estimate is a


constant !

9 / 25
Introduction

Estimator

Example 4
1 Let X1 , . . . , Xn be n random variables with mean θ = µ, an estimator
of µ is for example given by
n
1X
µ̂(X) = Xi
n i=1

2 If for example, we observe x = (2, 1.4, 4.2, 5.6) then

1
µ̂(x) = (2 + 1.4 + 4.2 + 5.6) = 3.3
4
is the estimate of µ based on x.

10 / 25
Introduction

Properties of an estimator

Definition 4
Let X1 , . . . , Xn be a random sample from some population with a
parameter θ and suppose that θ̂ = θ̂(X1 , . . . , Xn ) is an estimator of θ. The
biais of θ̂ is defined to be

Biais(θ̂) = B(θ̂) = E(θ̂) − θ.

If B(θ̂) = 0, we say that θ̂ is unbiased for θ. The mean squared error of


θ̂ is defined to be
h i
MSE(θ̂) = E (θ̂ − θ)2 = V(θ̂) + B2 (θ̂).

Remark : It is clear that, if B(θ̂) = 0 then MSE(θ̂) = V(θ̂).


11 / 25
Introduction

Properties of an estimator

We assume that we have a sequence of estimators, say θ̂1 , θ̂2 , . . ., which


usually represent a sequence of estimators of θ with increasing sample size
so that θ̂n is based on a sample of size n.

We say that the sequence {θ̂n } is asymptotically unbiased for θ if


limn→∞ E(θ̂n ) = θ.

We say that the sequence {θ̂n } is consistent (or weakly consistent) for θ
if, for every ϵ > 0, limn→∞ P(|θ̂n − θ| > ϵ) = 0.
In a probability class, you would say that θ̂n converges to θ in probability.
In statistics classes, we say that θ̂n is consistent for θ. This says that, for
large n, the distribution of θ̂n is concentrated around θ.

12 / 25
Introduction

Markov inequality

If X is a random variable and u(x) is a non-negative real-valued function


then, for any real c > 0,

E[u(X )]
P(u(X ) ≥ c) ≤ .
c
In particular,
E[|X |]
P(|X | ≥ c) ≤ ,
c
which is the Markov inequality.

13 / 25
Introduction

Chebyshev’s inequality

If we let µ = E(X ), σ2 = V(X ) < ∞ and take u(x) = |x − µ|, we obtain

E[|X − µ|2 ] σ2
P(|X − µ| ≥ c) = P(|X − µ|2 ≥ c 2 ) ≤ = .
c2 c2
Taking c = kσ yields Chebyshev’s inequality,

E[|X − µ|2 ] 1
P(|X − µ| ≥ kσ) = P(|X − µ|2 ≥ k 2 σ2 ) ≤ = 2.
k 2 σ2 k

For example, the probability that X is more than k = 2 standard deviations


from µ is bounded above by 212 = 14 .

14 / 25
Introduction

More on inequalities

Let {θ̂n } be a sequence of estimators of θ. We have


h i
E |θ̂n − θ|2 MSE(θ̂n )
P(|θ̂n − θ| ≥ c) ≤ = .
c2 c2

Thus, if MSE(θ̂n ) → 0 as n → ∞, then θ̂n is consistent for θ. This is


commonly referred to as mean squared error consistency.
If a sequence of estimators is MSE consistent, then it is consistent.
It is not true that all consistent estimators are MSE consistent.

15 / 25
Introduction

Example

Example 5
Going back to Example 2, if V(Xi ) < σ2 , we have that

E[X̄n ] = E[(X1 + . . . + Xn )/n] = µ and V(X̄n ) = σ2 /n.


σ2
Thus, X̄n is unbiased and MSE(X̄n ) = V(X̄n ) = n . Hence,

σ2
P(|X̄n − µ| ≥ ϵ) ≤ → 0 as n → ∞.
nϵ 2
Since this is true for all ϵ > 0, we have that X̄n is consistent for µ.

16 / 25
Introduction

Exercises

Exercise 1 ( )
For a random sample X1 , . . . , Xn such that E(Xi4 ) < ∞, show that S 2 is
consistent for σ2 .

Exercise 2 ( )
Suppose that X1 , X2 , . . . are iid Exp(β) random variables and define
n n
1X 1 X
X̄n = Xi and Sn2 = (Xi − X̄n )2 .
2 i=1 n − 1 i=1

1 Show that X̄n is unbiased and consistent for β and that Sn2 is unbiased
and consistent for β2 .
2 Since X̄n is unbiased for β, we could consider using X̄n2 as an estimator
of β2 instead of Sn2 . Show that X̄n2 is asymptotically unbiased for β2 .
What is the bias ? 17 / 25
Some Additional Results

Contents

1 Introduction

2 Some Additional Results

18 / 25
Some Additional Results

Convergence in Probability to a Constant

Let {Yn }∞
n=1 be a sequence of random variables, then we say that Yn
converges to a constant c in probability if

lim P(|Yn − c| > ϵ) = 0, for all ϵ > 0.


n→∞

Remark
P
To say that Yn converges to c in probability, we write Yn → c.
If Yn ’s are all estimators of c, we will say that {Yn } is consistent for c.

Example 6
P
Let Yn ∼ Exp(1/n), show that Yn → 0. We have

lim P(|Yn − 0| > ϵ) = lim P(Yn > ϵ) = lim e −nϵ = 0, for all ϵ > 0.
n→∞ n→∞ n→∞
19 / 25
Some Additional Results

Convergence in Probability to a Constant

Proposition
Let a, b, c, d be real constants. Let {Xn } and {Yn } be sequences of random
P P
variables such that Xn → c and Yn → d. Then,
P
1 aXn + bYn → ac + bd.
P
2 Xn Yn → cd.
P
3 If g (·) is continuous at c, then g (Xn ) → g (c).

Example 7
P
Let Xn → c ∈ R and let g (·) be a real valued function, continuous at c.
P
1 Provided c , 0, 1/Xn → 1/c (g (x) = 1/x).
√ P √ √
2 Provided c > 0, Xn → c (g (x) = x).
20 / 25
Some Additional Results

Law of Large Numbers (LLL)


LLL (Khinchin’s version)
If X1 , X2 , . . . are iid random variables such that µ = E(Xi ) < ∞, then

lim P(|X̄n − µ| > ϵ) = 0, for all ϵ > 0.


n→∞

P
That is, X̄n is consistent for µ or X̄n → µ.

Let Yi = Xik for some k ∈ N and all i ∈ N, and define


Sn = Y1 + Y2 + . . . + Yn such that E(Yi ) < ∞. Then
n
1X k
X is consistent for E(Xik ) = µ′k .
n i=1 i

Remark : For a random variable X , if E(X k ) < ∞ then E(X j ) < ∞ (and so
E[(X − µ)j ] < ∞) for j = 1, . . . , k.
21 / 25
Some Additional Results

We define the sample moments for a random sample {Xi }ni=1 as


follows
n
1X k
Mk = X k = 1, 2, . . .
n i=1 i

Since n and the Xi are finite (observable), Mk will always exist.

When µ′k exists, Mk are consistent for µ′k .

Note that we have lost the sample size in this notation.

22 / 25
Some Additional Results

Convergence in Distribution

A sequence of random variables X1 , X2 , . . . with respective CDF’s FXi


d
converges to X (with CDF FX ) in distribution, written Xn → X , if

lim FXn (t) = FX (t)


n→∞

at all t for which F is continuous.

Remark
Convergence in distribution is weaker than convergence in
probability.

23 / 25
Some Additional Results

Slutsky’s Theorem

Proposition
Let {Xn }∞ ∞
n=1 and {Yn }n=1 be sequences of random variables such that
d P
Xn → X and Yn → c, where X is a random variable and c is a constant.
Then
d
1 Xn + Yn → X + c.
d
2 Xn Yn → cX .
d
3 Provided c , 0, Xn /Yn → X /c.

24 / 25
Some Additional Results

Univariate Delta Method

Proposition
Let {Xn }∞n=1 be a sequence of random variables. Let θ, σ ∈ R be constants
2

with σ2 > 0, and let g (·) be a real valued function such that g ′ (θ) , 0
exists. If
√ d
n(Xn − θ) → N(0, σ2 ),
then
√ d
n(g (Xn ) − g (θ)) → N(0, σ2 [g ′ (θ)]2 ).

25 / 25

You might also like