Download as pdf
Download as pdf
You are on page 1of 6

Econometrics test, for 11/Feb/2009

1. About the error term, and what does it mean.

When a mathematical model is specified, it assumes an exact (deterministic) relationship


between variables. However, the relationship between economic variables is generally inexact. So,
to allow for this inexact relationship, one must modify the deterministic model, by adding the error
term (u). The error term is a stochastic (random) variable, that represents all the factors not
accounted for in the econometric model. It stands as a surrogate of all omitted or neglected
variables that have not been included in the model.
For each individual in the sample, it represents his or her personal deviation from the expect
value.
Some of the reasons why such variables may not have been entered in the model include:
1. Vagueness of theory: the theory determining the behaviour might be incomplete
2. Unavailability of data: some important variable might have been excluded for lack of
reliable data
3. Core variables vs. peripheral variables: some variable may account for so little of the
variance of Y, or their influence may be stochastic, it does not pay to include them in the
model
4. Intrinsic variance of human behaviour
5. Poor proxy variables: may result in errors of measurement
6. Principle of parsimony: while attempting to keep the model moderately simple, some
variables, considered of less importance, might be left out
7. Wrong functional form
The classical normal linear model assumes that each ui follows a normal distribution, with
average 0, variance ơ2 and covariance 0, meaning the error terms are independently distributed.
2. PRF in relation to SRF. Discuss the difference between them.

A Population Regression Function is an expression that states that the expected value of the
distribution of Y, given Xi, is functionally related to Xi. It tells how the mean value of Y varies with
X, for the entire population.
The PRF is what we're trying to estimate in a linear regression. By defining a functional
form and regressing, we try to estimate the unknown coefficients that help define the relationship
between X and Y.
However, it is often impossible to directly determine the PRF. It is frequently difficult to
obtain data for the entire population, which forces investigators to sampling procedures: obtain data
from a sample and then estimate for the population. The result is called the Sample Regression
Function (SRF). Due to sampling fluctuations, there is often a degree of difference between the PRF
and the SRF.
When regressing the sample data to produce the Sample Regression Function, we obtain
estimates of the true parameters (β). One of the methods that assures these estimates are as close to
the real (population) parameters is the OLS (Ordinary Least Squares). It has been shown (and
translated into the Gauss-Markhov Theorem) that the parameters obtained through OLS are the Best
Linear Unbiased Estimators (BLUE). If we consider these parameters follow a normal distribution,
we can then go on to test whether or not the parameters are statistically different from the estimates,
by a t test.
3. Functional forms: what do they mean, why are they important.
When doing a linear regression, it is possible that the resulting equation is linear in
parameters but not necessarily in the variables. Depending on the form of equation we choose, fit
between the data and the predictions from the regression will differ.
We consider 4 kinds of linear equations – again, all are linear in parameters, but we apply
different transformations to the variables.
The first kind of equation is the log-linear model. In these forms, Y, β1 and X are all logged
(natural logarithm).

lnYi = lnβ1 + β2 lnXi + ui

In these models, the term lnβ1 (which can be described as α, since it is a constant) measures
the elasticity of the model: the percentage change in Y for a given (small) percentage change in X; it
is, however, a constant elasticity model.

The second kind of models are called semilog models. They can either be log-lin or lin-log
models.
The log-lin model has the general functional form:

ln Yt = β1 + β2 t + ut

where t stands for time. The usage of time as the independent variable is due to the fact that this
functional form is used to find out the rate of growth of an economic variable, which is given by the
slope coefficient, β2. β2, therefore, measures the relative change in Y for a given absolute change in
the value of t.
As for the lin-log model, it has the functional form

Yi = β1 + β2 lnXi + ui

Models with this functional form are give us the absolute change in Y for a percentage
change in X – this ration equals β2.

We then have the reciprocal models. These are useful because they consign the values of Y
to an asymptote: they have a limit built in.
Yi = β1 + β2 ln(1/Xi) + ui

As X increases, 1/X approaches 0, meaning that Y approaches β1 as its upper limit.

Finally, we consider the logarithmic reciprocal model,

lnYi = β1 - β2 (1/Xi) + ui

In these functional forms, initially Y increases at an increasing rate, and then increases at a
decreasing rate – like, for instance, in short run production functions.
When doing a linear regression, we are not sure at the start of what the functional form of
the regression might be. However, the process of deciding should follow some guidelines:

1. The underlying theory might suggest a specific functional form


2. The coefficients of the model should satisfy a-priori expectations – for instance, be positive
of negative where expected so
3. The statistical significance of the coefficients
4. The overall fit of the model; this should not, however, be over-emphasised.
5. Find out the rate of change of the regressand with respect to the regressor, as well as the
elasticity of the regressand with respect to the regressor.
4. Essay on R2. Discuss the differences between the two different kinds of R2.

In a Classical Linear Regression Model, for the two-variable case, r 2 is a measure of goodness of fit
of the regression to a set of data; for the multiple-variable regression model, the notation is R 2, and
the use is the same: describe how well a regression fits the data. R2 (and r2) is called the coefficient
of determination. Overall, R2 is simply an extension of r2.
To calculate r2, one should have information concerning the total variation of each case with respect
to the sample mean (the point predicted by the regression equation). The sum of these values gives
us the Total Sum of Squares (TSS). The total sum of squares is the sum of the Explained Sum of
Squares (ESS), which is explained by the regression, with the Residual Sum of Squares (RSS), the
sum of all the stochastic errors (the actual variation of Y about the regression line):

TSS = ESS + RSS

Since r2 calculates the goodness of fit, it is described as

r2 = ESS/TSS

Consequently, what r2 measures is the percentage of the variance in the dependent variable
that is explained, by the explanatory variable, in the regression model. Since it is a proportion, it
varies between 0 and 1, and the regression is considered “better” - with a better fit – the higher the
value of r2.
It is also possible to regress a dependent variable in more than one explanatory variable. For
these cases, we can determine R2, the multiple coefficient of determination. This gives us the
proportion of the dependant variable that is explained jointly by the explanatory variables, no matter
how many of them. Like with the two-variable case, R2 varies between 0 and 1, and the regression
is said to have a better fit the higher the value of R2.
However, care must be taken when comparing the values of R2. One of the characteristics of
this index is that it never decreases when variables are added to the equation, so it is quite possible
that meaningless variables are included in a regression, and the model therefore considered better
than one with less explanatory variables, without it being the actual case. So, to compare two
models for the same dependent variable, we must use the adjusted R 2, a value of the coefficient of
determination that accounts for the number of explanatory variables in the regression.
Besides using adjusted R2, we must also make sure that the dependent variable and the
sample size are the same, or else it is not possible to compare the goodness of fit of the two
regressions.
Finally, it should be noted that the objective of regression analysis is not to obtain the
highest value of R2 at the expense of actual, relevant description of reality. When deciding between
different solutions to a regression problem, the underlying theory (functional forms, signals of
coefficients) should take precedence over the coefficient of determination.

You might also like