Statistical Inference 2 Note 02

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

3.

Mean-Squared Error (MSE)


It is a measure of goodness or closeness of an estimator 𝓉(𝑋1 , 𝑋2 , … , 𝑋𝑛 ) of 𝜏(𝜃).
Definition: Mean-Squared Error
Let 𝑇 = 𝑡(𝑋1 , 𝑋2 , … , 𝑋𝑛 ) be an estimator of 𝜏(𝜃).
𝑬𝜽 [[𝑻 − 𝝉(𝜽)]𝟐 ] is defined to be the mean –squared error of the estimator
𝑇 = 𝑡(𝑋1 , 𝑋2 , … , 𝑋𝑛 ).
Notation
Let 𝑀𝑆𝐸𝑡 (𝜃) denote the mean –squared error of the estimator 𝑇 = 𝑡(𝑋1 , 𝑋2 , … , 𝑋𝑛 ).

The subscript 𝜃 on the expectation symbol 𝐸𝜃 indicates from which density in the family
under consideration the sample came. That is,
𝐸𝜃 [[𝑇 − 𝜏(𝜃)]2 ] = 𝐸𝜃 [[𝓉(𝑋1 , 𝑋2 , … , 𝑋𝑛 ) − 𝜏(𝜃)]2 ]

= ∫∙∙∙ ∫[𝓉(𝑥1 , … , 𝑥𝑛 − 𝜏(𝜃))]2 𝑓( 𝑥1 ; 𝜃) ∙∙∙ 𝑓( 𝑥𝑛 ; 𝜃)𝑑𝑥1 ∙∙∙ 𝑑 𝑥𝑛

Where 𝑓( 𝑥 ; 𝜃) is the probability density function from which the random sample was
selected.
𝐸𝜃 [[𝑇 − 𝜏(𝜃)]2 ] is a measure of the spread of 𝑇 values about 𝜏(𝜃), just as the variance of
a random variable is a measure of its spread about its mean.
If we were to compare estimators by looking at their mean-squared error, naturally we
would prefer one with small or smallest mean-squared error. We would define as best that
estimator with smallest mean-squared error, but such estimators rarely exist. In general the
mean-squared error of an estimator depends on 𝜃.
For any two estimators 𝑇1 = 𝓉1 (𝑋1 , 𝑋2 , … , 𝑋𝑛 ) and 𝑇2 = 𝓉2 (𝑋1 , 𝑋2 , … , 𝑋𝑛 ) of 𝜏(𝜃), their
respective mean-squared errors 𝑀𝑆𝐸𝓉1 (𝜃) and 𝑀𝑆𝐸𝓉2 (𝜃) as functions of 𝜃 are likely to
cross, so for some 𝜃, 𝑇1 has smaller MSE and for others 𝑇2 has smaller MSE. WE would
then have no basis for preferring one of the estimators over other.
Remark
𝑀𝑆𝐸𝑡 (𝜃) = 𝑣𝑎𝑟[𝑇] + {𝜏(𝜃) − 𝐸𝜃 [𝑇]}2
𝑀𝑆𝐸𝑡 (𝜃) = 𝑣𝑎𝑟[𝑇] + (𝑏𝑖𝑎𝑠)2
So, if 𝑇 is an unbiased estimator of 𝜏(𝜃), then 𝑀𝑆𝐸𝑡 (𝜃) = 𝑣𝑎𝑟[𝑇]

STAT 3112 1
Proof:
𝑀𝑆𝐸𝑡 (𝜃) = 𝐸𝜃 [𝑇 − 𝜏(𝜃)]2

= 𝐸𝜃 [[𝑇 − 𝐸𝜃 [𝑇]] − {𝜏 (𝜃) − 𝐸𝜃 [𝑇]}2 ]

= 𝐸𝜃 [(𝑇 − 𝐸𝜃 [𝑇])2 − 2[𝜏(𝜃) − 𝐸𝜃 [𝑇]]𝐸𝜃 [𝑇 − 𝐸𝜃 [𝑇]] + 𝐸𝜃 [{𝜏(𝜃) − 𝐸𝜃 [𝑇]}2 ]]

= 𝑣𝑎𝑟[𝑇] + {𝜏(𝜃) − 𝐸𝜃 [𝑇]}2


𝑀𝑆𝐸𝑡 (𝜃) = 𝑣𝑎𝑟[𝑇] + (𝑏𝑖𝑎𝑠)2
The term 𝜏(𝜃) − 𝐸𝜃 [𝑇] is called the basis of the estimator T and can be either positive,
negative or zero. The remark shows that the mean-squared error is the sum of two non-
negative quantities. It also shows how the mean-squared error, variance and bias of an
estimator are related.

Theorem:
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from density function 𝑓(∙) and let
1
𝑆 2 = (𝑛−1) ∑(𝑋𝑖 − 𝑋̅ )2
1 𝑛−3
Then 𝐸 [𝑆 2 ] = 𝜎 2 and 𝑣𝑎𝑟[𝑆 2 ] = 𝑛 [𝜇4 − (𝑛−1) 𝜎 4 ]

Example 1:
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from 𝑓(𝑥; 𝜃) = 𝜙𝜇,𝜎2 (𝑥) {Normal density}. Recall
that the maximum likelihood estimators of 𝜇 and 𝜎 2 are respectively, 𝑋̅ and
(1/𝑛) ∑(𝑋𝑖 − 𝑋̅ )2 . Find the mean-squared error of 𝑋̅ and (1/𝑛) ∑(𝑋𝑖 − 𝑋̅ )2 .
Answer:
Let 𝑇1 = 𝑋̅ and 𝑇2 = (1/𝑛) ∑(𝑋𝑖 − 𝑋̅)2
𝐸 [𝑇1 ] = 𝐸 [𝑋̅] = 𝜇
∴ 𝑇1 is an unbiased estimator for 𝜇.
𝑀𝑆𝐸 = 𝑣𝑎𝑟[𝑇1 ]
∑𝑥 1
= 𝑣𝑎𝑟[𝑋̅] = 𝑣𝑎𝑟 ( 𝑛 𝑖 ) = 𝑛2 ∑ 𝑣𝑎𝑟(𝑥𝑖 )

𝜎2
𝑀𝑆𝐸 =
𝑛

STAT 3112 2
𝐸 [𝑇2 ] = 𝐸 [(1/𝑛) ∑(𝑋𝑖 − 𝑋̅ )2 ]
1 ∑(𝑋𝑖 − 𝑋̅)2
= 𝑛𝐸[ ] × (𝑛 − 1)
𝑛−1

𝑛−1
= 𝐸[𝑆 2 ]
𝑛
𝑛−1
= 𝜎2 ≠ 𝜎2
𝑛

∴ 𝑇2 is not an unbiased estimator for 𝜎 2 .


𝑀𝑆𝐸 = 𝑣𝑎𝑟[𝑇2 ] + (𝑏𝑖𝑎𝑠)2
2
1 1
𝑀𝑆𝐸 = 𝑣𝑎𝑟 [𝑛 ∑(𝑋𝑖 − 𝑋̅ )2 ] + [𝜎 2 − 𝐸 [𝑛 ∑(𝑋𝑖 − 𝑋̅ )2 ]]

1 ∑(𝑋𝑖 − 𝑋̅)2 𝑛−1 2


= 𝑣𝑎𝑟 ( ) × (𝑛 − 1)2 + [𝜎 2 − ( 𝜎 2 )]
𝑛2 𝑛−1 𝑛

𝑛−1 2 1 𝑛−3 𝜎4
𝑀𝑆𝐸 = ( ) × 𝑛 [𝜇4 − (𝑛−1) 𝜎 4 ] + 𝑛2
𝑛

4. Consistency
In the previous section we define the mean-squared error of an estimator and the property
of unbiasedness. Both concepts were defined for a fixed sample size. In this section we will
define the concept that are define for increasing sample size.
In our notation for an estimator 𝜏(𝜃), let us use 𝑇𝑛 = 𝓉𝑛 (𝑋1 , 𝑋2 , … , 𝑋𝑛 ), where the subscript
𝑛 of 𝓉 indicating the sample size. Actually we will be considering a sequence of estimators,
say 𝑇1 = 𝓉1 (𝑋1 ), 𝑇2 = 𝓉2 (𝑋1 , 𝑋2 ), 𝑇3 = 𝓉3 (𝑋1 , 𝑋2 , 𝑋3 ),…, 𝑇𝑛 = 𝓉𝑛 (𝑋1 , 𝑋2 , … , 𝑋𝑛 ),…. An
obvious example is
𝑛
1
𝑇𝑛 = 𝓉𝑛 (𝑋1 , 𝑋2 , … , 𝑋𝑛 ) = 𝑋̅𝑛 = ∑ 𝑋𝑖
𝑛
𝑖=1

Ordinarily the functions 𝓉𝑛 in the sequence will be the same kind of function for each 𝑛.
When considering a sequence of estimators, it seems that a good sequence of estimators
should be one for which the values of the estimators tend to get closer to the quantity being
estimated as the sample size increase. The closeness to the parameter is measured by the
consistency.
Definition: Mean-Squared Error Consistency
Let 𝑇1 , 𝑇2 , … , 𝑇𝑛 be a sequence of estimators of 𝜏(𝜃), where 𝑇𝑛 = 𝓉𝑛 (𝑋1 , 𝑋2 , … , 𝑋𝑛 ) is
based on a sample of size 𝑛. This sequence of estimators is defined to be a mean-squared
error consistent sequence of estimators of 𝜏(𝜃), if and only if

STAT 3112 3
𝐥𝐢𝐦 𝑬𝜽 [[𝑻𝒏 − 𝝉(𝜽)]𝟐 ] = 𝟎 for all 𝜃 ∈
𝒏→∞

Remark:
Mean-squared error consistency implies that both the bias and the variance of 𝑇𝑛 approach
0, since,
𝐸𝜃 [[𝑇𝑛 − 𝜏(𝜃)]2 ] = 𝑣𝑎𝑟[𝑇𝑛 ] + {𝜏 (𝜃) − 𝐸𝜃 [𝑇𝑛 ]}2

Example:
1
In sampling from any density having mean 𝜇 and variance 𝜎 2 , let 𝑋̅𝑛 = 𝑛 ∑𝑛𝑖=1 𝑋𝑖 be a
1
sequence of estimators of 𝜇 and 𝑆𝑛 2 = ( ) ∑(𝑋𝑖 − 𝑋̅𝑛 )2 be a sequence of estimators of
𝑛−1
𝜎2.
𝜎2
𝐸 [(𝑋̅𝑛 − 𝜇)2 ] = 𝑣𝑎𝑟(𝑋̅𝑛 ) = → 0 𝑎𝑠 𝑛 → ∞
𝑛
Hence, the sequence {𝑋̅𝑛 } is a mean-squared error consistent sequence of estimators of 𝜇.
2 1 𝑛−3 4
𝐸 [(𝑆𝑛 2 − 𝜎 2 ) ] = 𝑣𝑎𝑟(𝑆𝑛 2 ) = [𝜇4 − ( ) 𝜎 ] → 0 𝑎𝑠 𝑛 → ∞
𝑛 𝑛−1
Hence, the sequence {𝑆𝑛 2 } is a mean-squared error consistent sequence of estimators of
𝜎2.
Note that if 𝑇𝑛 = (1/𝑛) ∑(𝑋𝑖 − 𝑋̅ )2 , then the sequence {𝑇𝑛 } is also a mean-squared error
consistent sequence of estimators of 𝜎 2 .

Definition: Simple consistency


Let 𝑇1 , 𝑇2 , … , 𝑇𝑛 be a sequence of estimators of 𝜏(𝜃), where 𝑇𝑛 = 𝓉𝑛 (𝑋1 , 𝑋2 , … , 𝑋𝑛 ). Then
sequence {𝑇𝑛 } is defined to be a simple (or weakly) consistent sequence of estimators of
𝜏(𝜃) if for every 𝜀 > 0 the following is satisfied:
lim 𝑃𝜃 [𝜏(𝜃) − 𝜀 < 𝑇𝑛 < 𝜏(𝜃) + 𝜀 ] = 1 for all 𝜃 ∈
𝑛→∞

Remark:
If an estimator is a mean-squared error consistent estimator, it is also a simple consistent
estimator, but not necessarily vice versa.

STAT 3112 4
Proof:
𝑃𝜃 [𝜏(𝜃) − 𝜀 < 𝑇𝑛 < 𝜏(𝜃) + 𝜀 ] = 𝑃 [|𝑇𝑛 − 𝜏(𝜃)| < 𝜀 ]

2 𝐸𝜃 [[𝑇𝑛 − 𝜏(𝜃)]2 ]
= 𝑃𝜃 [(𝑇𝑛 − 𝜏(𝜃)) − 𝜀 2 ] ≥ 1 −
𝜀2
By the chebyshev inequality, as 𝑛 → ∞, 𝐸𝜃 [[𝑇𝑛 − 𝜏(𝜃)]2 ] approaches 0. Hence,
lim 𝑃𝜃 [𝜏(𝜃) − 𝜀 < 𝑇𝑛 < 𝜏(𝜃) + 𝜀 ] = 1
𝑛→∞

Markov’s inequality
Let X be a non-negative random variable. Then for all 𝑎 > 0,
𝐸(𝑋)
𝑃 [𝑋 ≥ 𝑎 ] ≤ 𝑎

For all 𝑘 > 0, chebyshev inequality is defined as


1
𝑃[|𝑋 − 𝜇| ≥ 𝑘𝜎] ≤
𝑘2

Example:
Let 𝑋1 , 𝑋2 , 𝑋3 be three elements of a random sample from some density which has mean 𝜇
and 𝜎 2 = 2. Let 𝑇1 and 𝑇2 be two estimators of 𝜇.
1 1 1
𝑇1 = 2 𝑋1 + 4 𝑋2 + 4 𝑋3
1 5
𝑇2 = 2 𝑋1 + 2𝑋2 + 2 𝑋3

a) Find the bias, variance and mean-squared error of 𝑇1 and 𝑇2 .


b) Check whether are these estimators are consistent or not.
c) What is the best estimator among these two estimators?

5. Efficiency
A desirable property of a good estimator is not only to unbiased, but also to have a small
variance, which transmit into a small mean-squared error for estimators, regardless of
whether they are biased or unbiased.

If we have two unbiased estimators of a parameter, 𝜃̂1 and 𝜃̂2 , we say that 𝜃̂1 is relatively
more efficient than 𝜃̂2 if 𝑽𝒂𝒓(𝜽
̂ 𝟐 ) > 𝑽𝒂𝒓(𝜽
̂𝟏) .

STAT 3112 5
Definition: Efficiency
One way to compare the mean-squared errors of two estimators is by using relative
efficiency. Given two estimators 𝑇1 and 𝑇2 , efficiency of 𝑇1 relative to 𝑇2 denoted by
𝑒𝑓𝑓 ( 𝑇1 , 𝑇2 ), is given by
𝑀𝑆𝐸 ( 𝑇2 )
𝑒𝑓𝑓 ( 𝑇1 , 𝑇2 ) =
𝑀𝑆𝐸 ( 𝑇1 )
𝑒𝑓𝑓 ( 𝑇1 , 𝑇2 ) > 1 ⟹ 𝐶ℎ𝑜𝑜𝑠𝑒 𝑇1
𝑒𝑓𝑓 ( 𝑇1 , 𝑇2 ) < 1 ⟹ 𝐶ℎ𝑜𝑜𝑠𝑒 𝑇2

Given two unbiased estimators 𝑇1 and 𝑇2 of 𝜃, efficiency of 𝑇1 relative to 𝑇2 denoted by


𝑒𝑓𝑓 ( 𝑇1 , 𝑇2 ), is given by
𝑉𝑎𝑟( 𝑇2 )
𝑒𝑓𝑓 ( 𝑇1 , 𝑇2 ) =
𝑉𝑎𝑟 ( 𝑇1 )
𝑒𝑓𝑓 ( 𝑇1 , 𝑇2 ) > 1 ⟹ 𝐶ℎ𝑜𝑜𝑠𝑒 𝑇1
𝑒𝑓𝑓 ( 𝑇1 , 𝑇2 ) < 1 ⟹ 𝐶ℎ𝑜𝑜𝑠𝑒 𝑇2

Example:
Let 𝑌1 , 𝑌2 , … , 𝑌𝑛 denote the random sample from the uniform distribution over the interval
𝑛+1
0 to 𝜃. Consider two estimators 𝜃̂1 = 2𝑌̅ and 𝜃̂2 = 𝑌𝑛 where 𝑌𝑛 = 𝑚𝑎𝑥 ( 𝑌1 , 𝑌2 , … , 𝑌𝑛 )
𝑛
𝑛𝑦 𝑛−1
and density function of 𝑌𝑛 is given by 𝑓 ( 𝑦 ; 𝜃) = , 0 ≤ 𝑦 ≤ 𝜃.
𝜃𝑛

Find the efficiency of 𝜃̂2 relative to 𝜃̂1 .


Answer:
𝜃
Because each 𝑌𝑖 follows a uniform distribution on the interval (0, 𝜃), 𝐸 (𝑌𝑖 ) = 2 and
𝜃2
𝑉𝑎𝑟(𝑌𝑖 ) = 12 . Therefore,

𝐸(𝜃̂1 ) = 𝐸 (2𝑌̅) = 𝜃

So, 𝜃̂1 is unbiased. Furthermore,


𝜃2
𝑉𝑎𝑟(𝜃̂1 ) = 𝑉𝑎𝑟(2𝑌̅) =
3𝑛
To find the mean and variance of 𝜃̂2 , recall that the density function of 𝑌𝑛 is given by,

STAT 3112 6
𝑛𝑦 𝑛−1
𝑓( 𝑦 ; 𝜃 ) = ,0 ≤ 𝑦 ≤ 𝜃
𝜃𝑛
Thus,
𝜃 𝑛𝑦 𝑛−1 𝑛 𝜃 𝑛𝑦 𝑛−1 𝑛
𝐸 (𝑌𝑛 ) = ∫0 𝑦 𝑑𝑦 = 𝑛+1 𝜃 and 𝐸(𝑌𝑛 2 ) = ∫0 𝑦 2 𝑑𝑦 = 𝑛+2 𝜃 2
𝜃𝑛 𝜃𝑛

Therefore, the variance of 𝑌𝑛 is


𝑛 𝑛 2
𝑉𝑎𝑟(𝑌𝑛 ) = 𝐸(𝑌𝑛 2 ) − 𝐸 (𝑌𝑛 )2 = 𝜃2 − ( 𝜃)
𝑛+2 𝑛+1
Thus,
𝑛+1
𝐸(𝜃̂2 ) = 𝐸 ( 𝑌𝑛 ) = 𝜃
𝑛
So, 𝜃̂1 is unbiased estimator for 𝜃. Also,
𝑛+1 𝜃2
𝑉𝑎𝑟(𝜃̂2 ) = 𝑉𝑎𝑟 ( 𝑌𝑛 ) =
𝑛 𝑛 (𝑛 + 2)
Finally, the efficiency of 𝜃̂2 relative to 𝜃̂1 is given by
𝜃2
𝑉𝑎𝑟( 𝜃̂2 ) 𝑛(𝑛 + 2) 𝑛 + 2
𝑒𝑓𝑓( 𝜃̂2 , 𝜃̂1 ) = = =
𝑉𝑎𝑟( 𝜃̂1 ) 𝜃2 3
3𝑛
When 𝑛 > 1, 𝑉𝑎𝑟( 𝜃̂1 ) > 𝑉𝑎𝑟( 𝜃̂2 )

∴ 𝜃̂2 is more efficient than 𝜃̂1 .

STAT 3112 7

You might also like