Lecture 5

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Lecture 5: Prediction and the delta method

One application of conditional expectations is in prediction:


First we want to predict a random variable Y with a constant c. How should we choose c? The quality of
approximation is often measured with the mean squared error:

M SE = E(Y − c)2

The best predictor c is the value of c that minimizes the MSE. Write Y − c = (Y − EY ) + (EY − c) and expand
the square. This gives the famous bias-variance decomposition of the MSE:

M SE = V ar(Y ) + (EY − c)2

where EY − c is the bias. The sum is minimized by c = EY , so the expected value of Y is the best constant
predictor.
Now we suspect that Y depends on some other random variable X, which is observed. For example, the
height of a person would depend on the height of her father, so we can try to predict the height of the child
using the height of the father. So now we are looking for a function of X, let’s call it h(X), which minimizes
2
M SE = E Y − h(X) . (Note that the expectation here is with respect to the joint distribution of X and Y .)
Using the law of iterated expecations:
 
M SE = E E(Y − h(X))2 |X

But given X = x, we already saw that E(Y − h(x))2 is minimized by h(x) = E(Y |X = x). Therefore our
solution is h(X) = E(Y |X).
But this predictor is difficult to use in practice since it depends on the joint distribution of X and Y , which is
often difficult to approximate. A less ambitious proposal would be to find the best linear predictor of the form
h(X) = α + βX. Calculations show (see the book) that the best linear predictor is given by

Cov(X, Y )
EY + (X − EX)
V ar(X)

This predictor depends on the joint distribution only via the means, variances, and the covariance. These are
quite easy to estimate.

The delta method (propagation of error)


We would like to know the mean and variance of Y = g(X), when we only know the mean and variance of X.
We have seen that this is easy if g is a linear function since E(a + bX) = a + b(EX) and V ar(a + bX) =
b2 Var(X). But if g is not linear, then usually Eg(X) 6= g(EX).
The idea of the delta method is that g may be approximately linear in a vicinity of EX, and since we know
from Chebychev’s inequality that most probability lies within a few SDs of EX, there is hope that a linear
approximation of g at µX = EX will give a good approximation to EY and to V ar(Y ).
Such a linear approximation is given by a Taylor series expansion of g about µX :

Y = g(X) ≈ g(µX ) + g 0 (µX )(X − µX )

1
The resulting approximations are:

EY ≈ g(µX ) since E(X − µX ) = 0


0
2
V ar(Y ) ≈ g (µX ) V ar(X)

This is called the delta-method.

One can improve the approximation with a second order Taylor series expansion:
1
Y = g(X) ≈ g(µX ) + g 0 (µX )(X − µX ) + g 00 (µX ) (X − µX )2
2
yielding EY ≈ g(µX ) + 12 g 00 (µX ) V ar(X).

We can apply the same idea if we are interested in a function of two variables: Z = g(X, Y ):

∂g ∂g
Z = g(X, Y ) ≈ g(µX , µY ) + (X − µX ) (µx , µY ) + (Y − µY ) (µx , µY )
∂x ∂y
Therefore

EZ ≈ g(µX , µY )
 ∂g 2  ∂g 2  ∂g  ∂g 
V ar(Z) ≈ (µx , µY ) V ar(X) + (µx , µY ) V ar(Y ) + 2 Cov(X, Y ) (µx , µY ) (µx , µY )
∂x ∂y ∂x ∂y

Again, one can use a second-order Taylor expansion of g(X, Y ) to get a better approximation of EZ.
Y
As an example, the book works out the delta method for the ratio Z = X.

Chapters 4.4–4.6

You might also like