Professional Documents
Culture Documents
Sage Dictionary Ols Regression
Sage Dictionary Ols Regression
Sage Dictionary Ols Regression
df
p
df
p+q
RSS
p+q
/ df
p+q
where p represents the null model" # ( )" p+q represents the model # ( ) * +" and df are the degrees o%
%reedom associated with the designated model. &t can be seen %rom this equation that the ,-statistic is simply
based on the di%%erence in the deviances between the two models as a %raction o% the deviance o% the %ull
model" whilst ta2ing account o% the number o% parameters.
&n addition to the model-%it statistics" the /-square statistic is also commonly quoted and provides a
measure that indicates the percentage o% variation in the response variable that is =eplained' by the model.
/-square" which is also 2nown as the coe%%icient o% multiple determination" is de%ined as
R
8
=
RSS after regression
total RSS
and basically gives the percentage o% the deviance in the response variable that can be accounted %or by
adding the eplanatory variable into the model. !lthough /-square is widely used" it will always increase as
variables are added to the model (the deviance can only go down when additional variables are added to a
model). One solution to this problem is to calculate an ad>usted /-square statistic (/
8
a
) which ta2es into
account the number o% terms entered into the model and does not necessarily increase as more terms are
added. !d>usted /-square can be derived using the %ollowing equation
R
a
8
= R
8
k -R
8
nk-
where n is the number o% cases used to construct the model and k is the number o% terms in the model (not
including the constant).
#n e.a(/0e o% si(/0e OLS regression
! simple OLS regression model with a single eplanatory variable can be illustrated using the eample o%
predicting ice cream sales given outdoor temperature (?oteswara" -01;). The model %or this relationship
(calculated using so%tware) is
&ce cream consumption ( ;.8;1 * ;.;;@ temperature.
The parameter %or ) (;.8;1) indicates the predicted consumption when temperature is equal to zero. &t
should be noted that although the parameter ) is required to ma2e predictions o% ice cream consumption at
any given temperature" the prediction o% consumption at a temperature o% zero might be o% limited
use%ulness" particularly when the observed data does not include a temperature o% zero in it's range
(predictions should only be made within the limits o% the sampled values). The parameter + indicates that %or
each unit increase in temperature" ice cream consumption increases by ;.;;@ units. The signi%icance o% the
relationship between temperature and ice cream consumption can be estimated by comparing the deviance
statistics %or the two nested models in the table below. one that includes temperature and one that does not.
This di%%erence in deviance can be assessed %or signi%icance using the ,-statistic.
!ode0 de'iance (RSS) d% change in
de'iance
--statistic )-'a0ue
consumption ( a ;.-833 80
;.;133 A8.8< B.;;;-
consumption ( ) * + temperature ;.;3;; 8<
On the basis o% this analysis" outdoor temperature would appear to be signi%icantly related to ice cream
consumption with each unit increase in temperature being associated with an increase o% ;.;;@ units in ice
cream consumption. Csing these statistics it is a simple matter to also compute the /-square statistic %or this
model" which is ;.;133D;.-833" or ;.6;. Temperature EeplainsF 6;4 o% the deviance in ice cream
consumption (i.e." when temperature is added to the model" the deviance in the # variable is reduced by
6;4).
OLS regression 1ith (u0ti/0e e./0anatory 'aria20es
The OLS regression model can be etended to include multiple eplanatory variables by simply adding
additional variables to the equation. The %orm o% the model is the same as above with a single response
variable (#)" but this time # is predicted by multiple eplanatory variables ($
-
to $
@
).
# ( ) * +
-
$
-
* +
8
$
8
* +
@
$
@
The interpretation o% the parameters () and +) %rom the above model is basically the same as %or the simple
regression model above" but the relationship cannot now be graphed on a single scatter plot. ) indicates the
value o% # when all vales o% the eplanatory variables are zero. Gach + parameter indicates the average
change in # that is associated with a unit change in $" whilst controlling %or the other eplanatory variables
in the model. 7odel-%it can be assessed through comparing deviance measures o% nested models. ,or
eample" the e%%ect o% variable $
@
on # in the model above can be calculated by comparing the nested models
# ( ) * +
-
$
-
* +
8
$
8
* +
@
$
@
# ( ) * +
-
$
-
* +
8
$
8
The change in deviance between these models indicates the e%%ect that $
@
has on the prediction o% # when the
e%%ects o% $
-
and $
8
have been accounted %or (it is" there%ore" the unique e%%ect that $
@
has on # a%ter ta2ing
into account $
-
and $
8
). The overall e%%ect o% all three eplanatory variables on # can be assessed by
comparing the models
# ( ) * +
-
$
-
* +
8
$
8
* +
@
$
@
# ( ).
The signi%icance o% the change in the deviance scores can be assessed through the calculation o% the ,-
statistic using the equation provided above (these are" however" provided as a matter o% course by most
so%tware pac2ages). !s with the simple OLS regression" it is a simple matter to compute the /-square
statistics.
#n e.a(/0e o% (u0ti/0e OLS regression
! multiple OLS regression model with three eplanatory variables can be illustrated using the eample %rom
the simple regression model given above. &n this eample" the price o% the ice cream and the average income
o% the neighbourhood are also entered into the model. This model is calculated as
&ce cream consumption ( ;.-01 H -.;AA price * ;.;@@ income * ;.;;@ temperature.
The parameter %or ) (;.-01) indicates the predicted consumption when all eplanatory variables are equal to
zero. The + parameters indicate the average change in consumption that is associated with each unit increase
in the eplanatory variable. ,or eample" %or each unit increase in price" consumption goes down by -.;AA
units. The signi%icance o% the relationship between each eplanatory variable and ice cream consumption can
be estimated by comparing the deviance statistics %or nested models. The table below shows the signi%icance
o% each o% the eplanatory variables (shown by the change in deviance when that variable is removed %rom
the model) in a %orm typically used by so%tware (when only one parameter is assessed" the ,-statistic is
equivalent to the t-statistic (, ( It) which is o%ten quoted in statistical output).
de'iance
change
d% --'a0ue )-'a0ue
coe%%icient
/rice ;.;;8 - ,-"86( -.361 ;.888
inco(e ;.;-- - ,-"86( 1.01@ ;.;;0
te(/erature ;.;<8 - ,-"86( 6;.838 B;.;;;-
residua0s ;.;@3 86
Jithin the range o% the data collected in this study" temperature and income appear to be signi%icantly related
to ice cream consumption.
3onc0usion
OLS regression is one o% the ma>or techniques used to analyse data and %orms the basis o% many other
techniques (%or eample !KOL! and the Meneralised linear models" see /uther%ord" 8;;-). The use%ulness
o% the technique can be greatly etended with the use o% dummy variable coding to include grouped
eplanatory variables (see :utcheson and 7outinho" 8;;<" %or a discussion o% the analysis o% eperimental
designs using regression) and data trans%ormation methods (see" %or eample" ,o" 8;;8). OLS regression is
particularly power%ul as it relatively easy to also chec2 the model asumption such as linearity" constant
variance and the e%%ect o% outliers using simple graphical methods (see :utcheson and So%roniou" -000).
-urther Reading
!gresti" !. (-006). An Introduction to Categorical ata Anal!sis. Nohn Jiley and Sons" &nc.
,o" N. (8;;8). An R and S-"lus Co#panion to Applied Regression$ LondonO Sage Publications.
:utcheson" M. 9. and 7outinho" L. (8;;<). Statistical 7odeling %or 7anagement. Sage Publications.
:utcheson" M. 9. and So%roniou" K. (-000). %&e 'ulti(ariate Social Scientist$ LondonO Sage Publications.
?oteswara" /. ?. (-01;). Testing %or the &ndependence o% /egression 9isturbances. )cono#etrica" @<O"01-
--1.
/uther%ord" !. (8;;-). &ntroducing !KOL! and !KQOL!O a ML7 approach. LondonO Sage Publications.
/yan" T. P. (-001). 'odern Regression 'et&ods$ QhichesterO Nohn Jiley and Sons.
Mraeme :utcheson
7anchester Cniversity