Goodness of Fit: Squares (ESS) To The Total Sum of Squares (TSS)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Goodness of fit

Any two variables (even with no connection in real life) will provide you with some regression
equation, and you can manipulate the coefficients by choice of units of measurement of X and Y.
How can you tell good from bad regressions?
R² (the coefficient of determination), the square of the correlation coefficient, is one way. It
measures the ‘goodness of fit’ of the equation. The closer the observations are to the line, on
average, the higher will be R².
0  R²  1
To calculate R2 in simple regression, just calculate r² = 0.824² = 0.678. This means that about
68% of variation of Y is explained by variation in X. This is a respectable result.
Alternatively, R2 can be calculated as the ratio of two sums of squares, the explained sum of
squares (ESS) to the total sum of squares (TSS).
ESS
R2 
TSS
Each observation’s deviation from the overall mean (of Y) contributes to the TSS. These
deviations can be divided into a part due to the regression ( Yˆ  Y ) and a random, or residual,
part ( Y  Yˆ ). Hence we have:

  
Y  Y  Yˆ  Y  Y  Yˆ 
If we square each term and sum over all observations, we obtain

 Y Y  =  Ŷ  +  Y  Ŷ 
2 2
Y
2
i i i i

More simply, we can denote this:


TSS = ESS + RSS
i.e. the total sum of squares = explained sum of squares + residual sum of squares.
ESS measures the extent to which the deviations from the mean are ‘explained’ by the variation
of the explanatory variable. What is left unexplained is called the residual sum of squares
(RSS).
TSS measures the sum of squared deviations of the observations around the mean value (of Y)
and is calculated as:

TSS   Yi  Y   Yi 2  nY 2 = 12564 - 12  31.662 = 530.667


2

RSS can be calculated as



RSS   Yi  Yˆi   Y
2
i
2
 a  Yi  b X i Yi = 12564 - 40.711  380 - (-2.7)  1139.70 =
170.754
and ESS follows:
ESS  TSS - RSS = 530.667 - 170.754 = 359.913
Hence we obtain R2 = ESS/TSS = 359.913/530.667 = 0.678, the same as obtained by squaring r.

Prediction
The calculated regression line can be used for prediction (making the assumption of a zero error),
for values of X not in the sample. For X = 3, the predicted value of Y is 40.711 - 2.7  3 = 32.6

You might also like