Professional Documents
Culture Documents
Lecture 5
Lecture 5
Lecture 5
M SE = E(Y − c)2
The best predictor c is the value of c that minimizes the MSE. Write Y − c = (Y − EY ) + (EY − c) and expand
the square. This gives the famous bias-variance decomposition of the MSE:
where EY − c is the bias. The sum is minimized by c = EY , so the expected value of Y is the best constant
predictor.
Now we suspect that Y depends on some other random variable X, which is observed. For example, the
height of a person would depend on the height of her father, so we can try to predict the height of the child
using the height of the father. So now we are looking for a function of X, let’s call it h(X), which minimizes
2
M SE = E Y − h(X) . (Note that the expectation here is with respect to the joint distribution of X and Y .)
Using the law of iterated expecations:
M SE = E E(Y − h(X))2 |X
But given X = x, we already saw that E(Y − h(x))2 is minimized by h(x) = E(Y |X = x). Therefore our
solution is h(X) = E(Y |X).
But this predictor is difficult to use in practice since it depends on the joint distribution of X and Y , which is
often difficult to approximate. A less ambitious proposal would be to find the best linear predictor of the form
h(X) = α + βX. Calculations show (see the book) that the best linear predictor is given by
Cov(X, Y )
EY + (X − EX)
V ar(X)
This predictor depends on the joint distribution only via the means, variances, and the covariance. These are
quite easy to estimate.
1
The resulting approximations are:
One can improve the approximation with a second order Taylor series expansion:
1
Y = g(X) ≈ g(µX ) + g 0 (µX )(X − µX ) + g 00 (µX ) (X − µX )2
2
yielding EY ≈ g(µX ) + 12 g 00 (µX ) V ar(X).
We can apply the same idea if we are interested in a function of two variables: Z = g(X, Y ):
∂g ∂g
Z = g(X, Y ) ≈ g(µX , µY ) + (X − µX ) (µx , µY ) + (Y − µY ) (µx , µY )
∂x ∂y
Therefore
EZ ≈ g(µX , µY )
∂g 2 ∂g 2 ∂g ∂g
V ar(Z) ≈ (µx , µY ) V ar(X) + (µx , µY ) V ar(Y ) + 2 Cov(X, Y ) (µx , µY ) (µx , µY )
∂x ∂y ∂x ∂y
Again, one can use a second-order Taylor expansion of g(X, Y ) to get a better approximation of EZ.
Y
As an example, the book works out the delta method for the ratio Z = X.
Chapters 4.4–4.6