BEST Linear Estimators

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Best Linear Estimators ARE 210 Page 1

BLE, BLUE and BLMSE


1. How do we estimate the unknown parameters of a probability distribution?
2. What kind of inferences can we make based on those parameter estimates?
3. Under what conditions is our rule for estimating these unknown parameters optimal in
some reasonable sense?
4. When can we do better, and when not?

In these notes, I develop three kinds of estimators that are linear in the observations (data)
that can all be thought of in terms of optimization theory.
The BLE (the Best Linear Estimator) chooses a linear combination of the observations
from a random sample to minimize the variance without constraint. The result is weight
zero on each and every data point. While this estimator does in fact attain the global un-
restricted minimum of the variance for an estimator (its variance is always zero!), it may
be biased. Indeed, it will be biased with probability one for all sample sizes and probabil-
ity distributions.
The BLUE (Best Linear Unbiased Estimator) principle minimizes the variance of the
chosen linear combination of the data subject to the constraint that the estimator must be
unbiased.
The BLMSE (Best Linear Mean Squared Error Estimator) principle weights the square
of the bias equally with the variance in the objective function and minimizes the unre-
stricted global minimum of the sum (bias
2
+ variance).
Best Linear Estimators ARE 210 Page 2
We begin by supposing that we have a random sample of i.i.d. random variables,
1 2
, , ,
n
y y y " . Let the population mean for the underlying probability distribution for the
ys be u and let the population variance be
2
. Both of these are unknown.
For now, we will not make any further assumptions about the distribution. We do not as-
sume that we know the functional form of the pdf (such as normal). We will, however,
restrict our attention to linear combinations of the data, say
1

n
i i
i
w y
=
u =

, where the
weights w
i
are choice variables, to make calculating expectations simpler and to pose
the estimation problem better.
Writing the mean of u as

( ) ( )
1 1 1
( ) ( )
n n n
i i i i i
i i i
E E w y w E y w
= = =
u = = = u

, (1)
the variance of u is equal to
| |
{ } ( )
2
2
2

1 1
( )
n n
i i i
i i
E E E w y w
u
= =


= u u = u
`


)



{ }
2
1
( )
n
i i
i
E w y
=

= u

. (2)
We seek to choose the weights w
i
, for i = 1,,n, to minimize this function. Using the
composite function theorem, the necessary first-order conditions are

2

1
2( ) ( ) 0 1, ,
n
i j j
j
i
E y w y i n
w
u
=


= u u = =


" . (3)
Re-arranging terms and using the fact that independence implies zero covariance,
Best Linear Estimators ARE 210 Page 3

2
2
1
2 [( )( )] 2 0 1, ,
n
j i j i
j
i
w E y y w i n
w
u
=

= u u = = =


" . (4)
if and only if w
i
= 0 i =1,, n. Thus, the choice 0 u = achieves the global unrestricted
minimum variance of zero. But although this is a very precise estimator, it is likely to be
inaccurate.
It is prudent, therefore, to take the bias of an estimator into account. We will now develop
and discuss the statistical properties of two estimators that do the BLUE and the
BLMSE. To obtain the BLUE, note that for uto be unbiased, it must satisfy ( ) E u = u.
Applying this condition to (1), we have

( )
1
( )
n
i
i
E w
=
u = u = u

if and only if
1
1
n
i
i
w
=
=

. (5)
Thus, we now seek to find appropriate weights w
i
to minimize the variance (2) subject to
the adding up condition in (5) implied by unbiasedness. To accomplish this, we form the
Lagrangean function,

{ }
( )
2
1 1
( ) 1
n n
i i i
i i
E w y w
= =

= u +


L , (6)
and find a saddle point of L (a relative maximum with respect to the ws and a relative
minimum with respect to ). Since we do not have any inequality or sign restrictions on
the choice variables, and since the Lagrangean is convex in w (this is easy to prove, and
you should do it as an exercise), the first-order necessary and sufficient conditions are:

{ }
1
2( ) ( ) 0 1,...,
n
j i i
i
j
E y w y j n
w
=


= u u = =


L
, (7)
Best Linear Estimators ARE 210 Page 4

1
1 0
n
i
i
w
=

= =


L
. (8)
Rearranging the terms inside the {} and then passing the expectation operator through by
the distributive law, we can rewrite (7) as

1
2 [( )( )] 1,...,
n
i i j
i
j
w E y y j n
w
=

= u u = =


L
. (9)
Now, again using the fact that independence implies zero covariance, we obtain

2
2 1,...,
j
j
w j n
w

= = =

L
. (10)
Solving for the w
j
terms, we have
2
2
j
w j = . Then substituting this into (8) and
solving for gives
2 2
1
1 2 2
n
j
j
w n n
=
= = =

. Thus, we obtain the optimal


weights for the BLUE for u as 1 , 1, ,
j
w n j n = = " . Finally, this implies that the best
linear unbiased estimator for u, regardless of the underlying distribution for
1 2
, , ,
n
y y y " ,
is the sample mean,
1

n
i
i
y n
=
u =

.
Before proceeding to the BLMSEE, we will briefly to develop the statistical properties of
the sample mean. First, by construction, y u = is unbiased. We can easily verify this by
using the linearity of the expectation operator, to show that

( )
1 1 1
( ) ( )
n n n
i i
i i i
E y E y n E y n n n n
= = =
= = = u = u = u

. (11)
Second, also by construction, y u = has smallest variance among all possible unbiased
estimators for u that are formed as linear combinations of the y
i
. We can easily calculate
Best Linear Estimators ARE 210 Page 5
its variance by using the fact that the y
i
are statistically independent, and therefore uncor-
related, so that

( ) ( )
2 2
2
1 1
[( ) ] ( )
n n
i i
i i
E y E y n E y n
= =

u = u = u




( ) ( )
1 2 2
2
1 2 1
1 ( ) 2 1 ( )( )
n n i
i i j
i i j
E n y n y y
=
= = =

= u + u u



( ) ( )
1 2 2
2
1 2 1
1 [( ) ] 2 1 [( )( )]
n n i
i i j
i i j
n E y n E y y
=
= = =
= u + u u


( ) ( )
1 2 2
2
1 2 1
1 2 1 0
n n i
i i j
n n
=
= = =
= +



2
n = . (12)
Finally, the sample size adjusted and mean deviated random variable, ( ) n y u , has
mean 0 zero and variance
2
for all values of n 2. As long as this variance is finite, it
can be shown that as n , this random variable converges to one which has a normal
distribution (also with zero mean and variance
2
). This is one of the main justifications
for using the normal distribution theory and standard normal probability tables to con-
struct confidence intervals and perform hypothesis tests for all sorts of probability distri-
butions, as long as the sample size is reasonably large.
The BLUE applies a lexicographic preference ordering to the bias and variance of an es-
timator. That is, any degree of bias is strictly less preferred to no bias, regardless of what
the variance of the estimator is. This is a somewhat restrictive subjective criterion func-
tion (i.e., no utility function exists for such a preference ordering). An alternative to the
best linear unbiased estimator (BLUE) is the best linear mean square error estimator
Best Linear Estimators ARE 210 Page 6
(BLMSEE). This does not require the estimator to be unbiasedness. Instead, this princi-
ple weights the squared bias of an estimator equally with the variance as the criterion for
selecting the estimator. The criterion now is

( )
2
2
1
( )
n
i i
i
MSE E E w y
=

= u u = u

. (13)
By adding and subtracting
( ) ( )
1 1
n n
i i i
i i
E w y w
= =
= u

inside of the square term, we find
that the mean square error is the sum of the variance and the squared bias,

{ }
2
[ ( ) ( ) ] MSE E E E = u u + u u

{ }
2 2
[ ( )] 2[ ( )][ ( ) ] [ ( ) ] E E E E E = u u + u u u u + u u

{ }
2 2
[ ( )] 2[ ( ) ] [ ( )] [ ( ) ] E E E E E E = u u + u u u u + u u

2 2
( , ) B
u
+ u u

, (14)
where
2 2
{[ ( )] } E E
u
= u u

is the variance of the mean square error minimizing estimator


and ( , ) ( ) B E u u = u u is the bias. Returning to the right-hand-side of (10), we find the
first-order necessary (and sufficient) conditions for minimizing MSE with respect to our
choice variables w
i
, i = 1, , n, are

( )
1
2 0, 1,...,
n
i j j
j
i
MSE
E y w y i n
w
=


= u = =


. (15)
Distributing the y
i
inside the parentheses and then distributing the expectation operator
inside the square brackets then gives

( )
2 2 2
1 1
0 ( ) ( ) , 1,...,
n n
j i j i j i
j j
w E y y E y w w i n
= =
= u = u + u =

. (16)
Best Linear Estimators ARE 210 Page 7
Now, summing across all i gives

( ) ( )
2 2 2
1 1
0
n n
j i
j i
w w n
= =
u + u =

, (17)
which can be solved directly for the total sum of the linear weights,
1 1
n n
j i
j i
w w
= =
=

,
which gives

2
2 2 2 2 1
1
1
( ) [1 ( )]
n
j
j
n
w
n n
=
u
= = <
u + + u

. (18)
Plugging this back into (15) and solving for each (equal) w
i
then gives

( ) ( )
2
2 2 2
1
2 2
1
1
1 1
n
j
j
i
w n
w
=


u
`

u + u

)
= =



( )
( )
2 2 2
2 2
2 2 2
1 1
[ ( )]
1
n
n n
n
u u
= = <
+ u
+ u

. (19)
Thus, we see that the BLMSEE is biased toward zero (i.e., ( ) E u < u ), trading off a
smaller variance (due to weights that are less than 1/n) for some bias. In particular, by
taking the expected value of the linear estimator u , we obtain the bias as

( )
( )
1 2 2
( , )
1
n
i i
i
B E w y
n
=
u
u u = u = u

+ u



( )
( )
2 2
2 2
1
n
n
u
= u

+ u

. (20)
Again using the fact that the y
i
are statistically independent, we also can calculate the
Best Linear Estimators ARE 210 Page 8
variance of the linear estimator u as the sum of the squared weights times the variance of
the individual y
i
s, which gives

( )
2
2 2
2 2 2
2 2 1
1
1
n
i
i
w
n n n
u
=

| |
= = <
|
+ u
\ .


. (17)
Finally, we can see the gain in the mean square error by trading off some bias for an as-
sociated reduction in variance by squaring (16) and adding it to (17), which gives

2 2
( ( , ) MSE B
u
u. u) = + u u



( )
( )
( )
( )
2 4 2 2
2 2
2 2 2 2
1 1
n n
n n
u
= +

+ u + u



( )
2 2
2 2
1
( )
1
Var y
n n
n
| |
= < =
|

+ u
\ .

. (18)
Comments:
1. The optimal weights for the BLMSEE for u is a function of the unknown parameters
u and
2
. This makes it more difficult to calculate than the BLUE. However, it is
possible to show that we can use x and
2
s in the formulas for the weights of the
BLMSEE to dominate the BLUE in terms of finite sample mean square error.
2. The differences in the mean and variance properties of the BLMSEE and BLUE con-
verge to zero as sample size gets large (i.e., as n ). This large sample ( asymp-
totic) property is important and is a general characteristic of the relationship between
mean square error minimizing estimators and best unbiased estimators.

You might also like