Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Coefficient of is used for linear regression models without intercept

term. Using a form of R 2 that does not match the


Determination (R 2) type of model may lead to wrong conclusion, since
in general the different forms of R 2 are not equivalent
(see [1] for an excellent overview).
Very low values of R 2 (<0.5) indicate a weak
The coefficient of determination (usually denoted R 2 ) relation between the predictor variable(s) and the
is a concept in analysis of variance and regression response variable, while moderately low values of
analysis. It is a measure of the proportion of explained R 2 (<0.8 and >0.5) indicate that the model is not
variance present in the data. Hence, the higher the adequate, e.g., terms are missing, or that there is
value of R 2 , the better the model describes the data. substantial error variation (possibly caused by a large
When data consists of values y1 , . . . , yn of a response measurement spread). However, high values of R 2
variable and a model with predictor variables is do not necessarily imply that the model is adequate
applied to the data, then R 2 is defined as (see [1]). The major reason is that R 2 increases
when models are enlarged, even when this is done

n
(yi − 
yi )2 with irrelevant terms. This effect is only partially
i=1
taken away by using the adjusted form Radj 2
= 1−
R2 = 1 − (1) n−1
R 2
, when there are k predictor variables in a

n
n−k−1
(yi − 
y) 2
linear regression model (when there is no intercept,
i=1 then the denominator should be n − k). A proper
assessment of goodness of fit (see Lack of Fit)
where  y denotes the average of the observations of a regression model should consist of a balanced
and  yi the prediction of yi using the fitted model. judgment taking into account model assumptions,
In case there is no relation between the predictor the explained variation by the model as well as the
variable(s) and the response variable, then  y is the number of parameters in the model. This may be
best “model” to explain the data. Hence, the terms accomplished by analysis of the residuals, checking
yi − y account for deviations in the worst case that the number of significant effects, and performing a
there is no relation between the predictor variable(s) model significance test like the standard F -test in
and the response variable. The range of values of R 2 analysis of variance (which rigorously accounts for
depends on the type of model to be fitted; in standard the degrees of freedom and hence, also incorporates
cases like linear least-squares regression models they the number of parameters).
lie between 0 and 1.
In case of linear regression with only one predictor Reference
variable, R 2 equals the square of the correlation
between the values of the predictor variable and the [1] Kvålseth, T.O. (1985). Cautionary note about R 2 , The
response variable. Several different formulas of R 2 American Statistician 39(4), 279–285.
are known, e.g., the formula


n Related Articles
(yi − 
yi )2
i=1 Analysis of Variance; Correlation; Dependence;
R02 = 1 − (2) Degrees of Freedom; Lack of Fit.

n
yi2
i=1 ALESSANDRO DI BUCCHIANICO

You might also like