Professional Documents
Culture Documents
Econometrics Chapter 1 7 2d AgEc 1
Econometrics Chapter 1 7 2d AgEc 1
Econometrics Chapter 1 7 2d AgEc 1
The most important characteristics of economic relationships is that they contain a random
element, which, however ignored by economic theory and mathematical economics which
postulate exact relationships between the varies economic magnitudes. Econometrics has
developed methods for dealing with random component of economic relationships.
Economic theory makes statements that are mostly qualitative in nature, while
econometrics gives empirical content to most economic theory
The main concern of Mathematical economics is to express economic theory in
mathematical form without empirical verification of the theory, while econometrics is
mainly interested in the later
Economic Statistics is mainly concerned with collecting, processing and presenting
economic data in the form of charts and tables. It does not being concerned with using the
collected data to test economic theories. While econometrics do.
Mathematical statistics provides many of tools for economic studies, but econometrics
supplies the later with many special methods of quantitative analysis based on economic
data
Goals of econometrics
There are three main goals of econometrics:
(1) Analysis, i.e. testing of econometric theory
(2) Policy making, i.e. supplying numerical estimates of the coefficients of economic
relationships, which may be then used for decision making
(3) Forecasting, i.e. using the numerical estimates of the coefficients in order to forecast the
future values of the economic magnitudes. Of course, these goals are not mutually
exclusive. Successful econometric applications should include some combinations of all
three aims.
1
In any econometric research one may distinguish four stages.
The specification of the econometric model will be based on econometric theory and on any
available information relating to the phenomenon being studied. Thus the specification of the
model presupposes knowledge of econometric theory as well as familiarity with the particular
phenomenon being studied.
The number of variables to be included in the model depends on the nature of the phenomenon
being studied and the purpose of the research. Usually we introduce explicitly in the function
only the most important (four or five) explanatory variables. The influence of less important
factors is taken into account by the introduction in the model of a random variable, usually
denoted by u. The values of this random variable cannot be actually observed like the values of
the other explanatory variables. We thus have to guess at the pattern of the values of u by making
some plausible assumptions about their distribution. The statement of the assumptions about the
random variable is part of the specification of the model.
Thus the number of variables to be initially included in the model depends on the nature of the
economic phenomenon being studied, while the number of variables which will finally be
retained in the model depends on whether the parameter estimates related to the variables pass the
economic, statistical and econometric criteria, which we will discuss later.
In most cases economic theory does not explicitly state the mathematical form of economic
relationships. It is often helpful to plot the actual data on two-dimensional diagrams, taking two
variables at a time (the dependent and each one of the explanatory variables in turn). In most
cases the examinations of such scatter diagrams throws some light on the form of the function
and helps in deciding upon the choices of the mathematical form of the relationship connecting
the economic variables.
2
After the model has been specified (formulated) the econometrician proceed with its estimation,
in other words, he must obtain numerical estimates of the coefficients of the model. The
estimation of the model is a purely technical stage which requires knowledge of the various
econometric methods, their assumptions and the economic implications of the parameters.
(2) Examination of the identification conditions of the function in which we are interested
Identification is the procedure by which we attempt to establish that the coefficients which we
shall estimate by the application of some appropriate econometric technique are actually the true
coefficients of the functions in which we are interested.
(3) Examination of the aggregation problems involved in the variables of the function
Aggregation problems arise from the fact that we use aggregative variables in our functions. Such
aggregative variables may involve: aggregation over individuals, aggregation over commodities,
aggregation over time periods, and spatial aggregation. The above sources of aggregation create
various complications which may impart some ‘aggregation bias’ in the estimates of the
coefficients.
(4) Examination of the degree of correlation between the explanatory variables, that is,
examination of the degree of multicollinearity
Most economic variables are correlated, in the sense that they tend to change simultaneously
during the various phases of economic activity. If, however, the degree of collinearity is high, the
results (measurements) obtained from econometric applications may be seriously impaired and
their use may be greatly misleading, because in these conditions it may not be computationally
possible to separate the influence of each one explanatory variable.
3
If the estimates of the parameters turn up with signs or size not conforming to economic theory,
they should be rejected, unless there is good reason to believe that in the particular instance the
principle of economic theory does not hold.
It should be noted that the statistical criteria are secondary only to the a priori theoretical criteria.
The estimates of the parameters should be rejected in general if they happen to have the wrong
sign (or size) even though the correlation coefficient is high, or the standard errors suggest that
the estimates are statistically significant. In such cases the parameters, though statistically
satisfactory, are theoretically implausible, that is to say they make no sense on the basis of the a
priori theoretical-economic criteria.
The evaluation of the results obtained from the estimation of the model, is a very complex
procedure. The econometrician must use all the above criteria, economic, statistical and
econometric, before he can accept or reject the estimates.
(1) Theoretical plausibility. The model should be compatible with the postulates of economic
theory. It must describe adequately the economic phenomena to which it relates.
(2) Explanatory ability. The model should be able to explain the observations of the actual
world. It must be consistent with the observed behavior of the economic variables whose
relationships it determines.
(3) Accuracy of the estimates of the parameters. The estimates of the coefficients should
be accurate in the sense that they should approximate as best as possible the true
parameters of the structural model.
(4) Forecasting ability. The model should produce satisfactory predictions of future
values of the dependent variable.
4
(5) Simplicity. The model should represent the economic relationships with maximum
simplicity. The fewer the equations and the simpler their mathematical form, the
better the model is considered, provided that the other desirable properties are not
affected by the simplification of the model.
The more of the above properties a model possesses, the better it is considered for any
practical purpose.
To illustrate the preceding steps, let us consider the well-known Keynesian theory of
consumption.
Where Y = consumption expenditure and X = income, and where β1 and β2, known as the
parameters of the model, are, respectively, the intercept and slope coefficients.
The slope coefficient β2 measures the MPC. This equation, which states that consumption is
linearly related to income, is an example of a mathematical model of the relationship between
consumption and income that is called the consumption function in economics. A model is
simply a set of mathematical equations. If the model has only one equation, as in the preceding
example, it is called a single-equation model, whereas if it has more than one equation, it is
known as a multiple-equation model.
In Eq. (1.3.1) the variable appearing on the left side of the equality sign is called the dependent
variable and the variable(s) on the right side are called the independent, or explanatory,
variable(s). Thus, in the Keynesian consumption function, Eq. (1.3.1), consumption (expenditure)
is the dependent variable and income is the explanatory variable.
5
To allow for the inexact relationships between economic variables, the econometrician would
modify the deterministic consumption function (1.3.1) as follows:
Where u, known as the disturbance, or error term, is a random (stochastic) variable. The
disturbance term u may well represent all those factors that affect consumption but are not taken
into account explicitly.
The hat on the Y indicates that it is an estimate. MPC was about 0.72 and it means that for the
sample period when real income increases by 1 USD, led (on average) real consumption
expenditure increases of about 72 cents.
Note: A hat symbol (^) above one variable will signify an estimator of the relevant population
value
6
(6) Hypothesis Testing
Are the estimates accord with the expectations of the theory that is being tested? Is MPC < 1
statistically? If so, it may support Keynes’ theory. Confirmation or refutation of economic
theories based on sample evidence is object of Statistical Inference (hypothesis testing).
Given MPC = 0.72, an income of $5882 Bill will produce an expenditure of $4000 Bill. By fiscal
and monetary policy, Government can manipulate the control variable X to get the desired level
of target variable Y.
In applied econometrics we use the tools of theoretical econometrics to study some special
field(s) of economics and business, such as the production function, investment function, demand
and supply functions, etc.
7
Chapter 2: THE NATURE OF REGRESSION ANALYSIS AND
TWO-VARIABLE REGRESSION ANALYSIS
Example
Dependent Variable Y; Explanatory Variable Xs
1. Y = Personal Consumption Expenditur X = Personal Disposable Income
2. Y = Demand; X = Price
3. Y = % Change in Demand; X = % Change in the advertising budget
4. Y = Crop yield; Xs = temperature, rainfall, sunshine, fertilizer
8
2.1.2 TERMINOLOGY AND NOTATION
In the literature the terms dependent variable and explanatory variable are described variously. A
representative list is:
Dependent Variable Explanatory Variable(s)
¯
Explained Variable Independent Variable(s)
¯
Predictand Predictor(s)
¯
Regressand Regressor(s)
¯
Response Stimulus or control variable(s)
¯
Endogenous Exogenous(es)
If we are studying the dependence of a variable on only a single explanatory variable, such as that
of consumption expenditure on real income, such a study is known as simple, or two-variable,
regression analysis. However, if we are studying the dependence of one variable on more than
one explanatory variable, as in the crop-yield, rainfall, temperature, sunshine, and fertilizer
examples, it is known as multiple regression analysis.
Cross-Section Data Cross-section data are data on one or more variables collected at the same
point in time, such as the census of population conducted by the Census Bureau every 10 years.
Pooled Data In pooled, or combined, data are elements of both time series and cross-section data.
Ratio Scale For a variable X, taking two values, X1 and X2, the ratio X1/X2 and the distance (X2 −
X1) are meaningful quantities. Also, there is a natural ordering (ascending or descending) of the
values along the scale. Therefore, comparisons such as X2 ≤ X1 or X2 ≥ X1 are meaningful. Most
economic variables belong to this category. E.g., it is meaningful to ask how big this year’s GDP
is as compared with the previous year’s GDP.
9
Interval Scale An interval scale variable satisfies the last two properties of the ratio scale
variable but not the first. Thus, the distance between two time periods, say (2000–1995) is
meaningful, but not the ratio of two time periods (2000/1995).
Ordinal Scale A variable belongs to this category only if it satisfies the third property of the ratio
scale (i.e., natural ordering). Examples are grading systems (A, B, C grades) or income class
(upper, middle, lower). For these variables the ordering exists but the distances between the
categories cannot be quantified.
Nominal Scale Variables in this category have none of the features of the ratio scale variables.
Variables such as gender (male, female) and marital status (married, unmarried, divorced,
separated) simply denote categories.
The data in the table refer to a total population of 60 families in a hypothetical community and
their weekly income (X) and weekly consumption expenditure (Y), both in dollars. The 60
families are divided into 10 income groups (from $80 to $260) and the weekly expenditures of
each family in the various groups are as shown in the table. Therefore, we have 10 fixed values of
X and the corresponding Y values against each of the X values.
There is considerable variation in weekly consumption expenditure in each income group, which
can be seen clearly from Figure 2.1. But the general picture that one gets is that, despite the
variability of weekly consumption expenditure within each income bracket, on the average,
weekly consumption expenditure increases as income increases.
10
Fig 2.1 Conditional distribution of expenditure for various level of income (data of table 2.1)
To see this clearly, in Table 2.1 we have given the mean, or average, weekly consumption
expenditure corresponding to each of the 10 levels of income. Thus, corresponding to the weekly
income level of $80, the mean consumption expenditure is $65, while corresponding to the
income level of $200, it is $137. In all we have 10 mean values for the 10 subpopulations of Y.
We call these mean values conditional expected values, as they depend on the given values of
the (conditioning) variable X. Symbolically, we denote them as E(Y|X), which is read as the
expected value of Y given the value of X.
It is important to distinguish these conditional expected values from the unconditional expected
value of weekly consumption expenditure, E(Y). If we add the weekly consumption expenditures
for all the 60 families in the population and divide this number by 60, we get the number $121.20
($7272/60), which is the unconditional mean, or expected, value of weekly consumption
expenditure, E(Y); it is unconditional in the sense that in arriving at this number we have
disregarded the income levels of the various families. Obviously, the various conditional expected
values of Y given in Table 2.1 are different from the unconditional expected value of Y of
$121.20. When we ask the question, “What is the expected value of weekly consumption
expenditure of a family,” we get the answer $121.20 (the unconditional mean). But if we ask the
question, “What is the expected value of weekly consumption expenditure of a family whose
monthly income is, say, $140,” we get the answer $101 (the conditional mean).
Geometrically, a population regression curve (line) is simply the locus of the conditional means
of the dependent variable for the fixed values of the explanatory variable(s).
11
2.2.2 THE CONCEPT OF POPULATION REGRESSION FUNCTION (PRF)
From the preceding discussion, it is clear that each conditional mean E(Y | Xi) is a function of Xi,
where Xi is a given value of X. Symbolically,
Where f (Xi) denotes some function of the explanatory variable X. In the above example, E(Y | Xi)
is a linear function of Xi. Equation (2.2.1) is known as the conditional expectation function
(CEF) or population regression function (PRF) or population regression (PR) for short. It
states merely that the expected value of the distribution of Y given Xi is functionally related to Xi.
In simple terms, it tells how the mean or average response of Y varies with X.
As a first approximation or a working hypothesis, we may assume that the PRF E(Y | Xi) is a
linear function of Xi, say, of the type
Where β1 and β2 are unknown but fixed parameters known as the regression coefficients
It is clear from Figure 2.1 that, as family income increases, family consumption expenditure on
the average increases, too. But what about the consumption expenditure of an individual family in
relation to its (fixed) level of income? It is obvious from Table 2.1 and Figure 2.1 that an
individual family’s consumption expenditure does not necessarily increase as the income level
increases. For example, from Table 2.1 we observe that corresponding to the income level of
$100 there is one family whose consumption expenditure of $65 is less than the consumption
expenditures of two families whose weekly income is only $80. But notice that the average
consumption expenditure of families with a weekly income of $100 is greater than the average
consumption expenditure of families with a weekly income of $80 ($77 versus $65).
We see from Figure 2.1 that, given the income level of Xi, an individual family’s consumption
expenditure is clustered around the average consumption of all families at that Xi, that is, around
its conditional expectation. Therefore, we can express the deviation of an individual Yi around its
expected value as follows:
ui = Yi − E(Y | Xi)
or
Yi = E(Y | Xi) + ui ------------------------------------------------------- (2.2.3)
where the deviation ui is an unobservable random variable taking positive or negative values.
Technically, ui is known as the stochastic disturbance or stochastic error term.
How do we interpret (2.2.3)? We can say that the expenditure of an individual family, given its
income level, can be expressed as the sum of two components: (1) E(Y | Xi), which is simply the
mean consumption expenditure of all the families with the same level of income. This component
is known as the systematic, or deterministic, component, and (2) ui, which is the random, or
nonsystematic, component. Stochastic disturbance term (ui) is a surrogate or proxy for all the
omitted or neglected variables that may affect Y but are not (or cannot be) included in the
regression model.
If E(Y | Xi) is assumed to be linear in Xi, as in Eq. (2.2.2), Eq. (2.2.3) may be written as
Yi = E(Y | Xi) + ui
= β1 + β2 Xi + ui ----------------------------------------------- (2.2.4)
we call this equation stochastic specification of the PRF (true PRF)
12
2.2.3 THE SAMPLE REGRESSION FUNCTION (SRF)
It is about time to face up to the sampling problems, for in most practical situations what we have
is but a sample of Y values corresponding to some fixed X’s. Therefore, the task now is to
estimate the PRF on the basis of the sample information.
As an illustration, pretend that the population of Table 2.1 was not known to us and the only
information we had was a randomly selected sample of Y values for the fixed X’s as given in
Table 2.2. Unlike Table 2.1, we now have only one Y value corresponding to the given X’s; each
Y (given Xi) in Table 2.2 is chosen randomly from similar Y’s corresponding to the same Xi from
the population of Table 2.1.
The question is: From the sample of Table 2.2 can we predict the average weekly consumption
expenditure Y in the population as a whole corresponding to the chosen X’s? In other words, can
we estimate the PRF from the sample data? As one surely suspects, we may not be able to
estimate the PRF “accurately” because of sampling fluctuations. To see this, suppose we draw
another random sample from the population of Table 2.1, as presented in Table 2.3.
Table 2-2: A random sample from the Table 2-3: Another random sample
population of table 2.1 from the population of table 2.1
Y X Y X
70 80 55 80
65 100 88 100
90 120 90 120
95 140 80 140
110 160 118 160
115 180 120 180
120 200 145 200
140 220 135 220
155 240 145 240
150 260 175 260
Plotting the data of Tables 2.3 and 2.4, we obtain the scattergram given in Figure 2.2. In the
scattergram two samples regression lines are drawn so as to “fit” the scatters reasonably well:
SRF1 is based on the first sample, and SRF 2 is based on the second sample. Which of the two
regression lines represents the “true” population regression line? There is no way we can be
absolutely sure that either of the regression lines shown in Figure 2.2 represents the true
population regression line (or curve).
13
The regression lines in Figure 2.2 are known as the sample regression lines. They represent the
population regression line, but because of sampling fluctuations they are at best an approximation
of the true PR. In general, we would get N different SRFs for N different samples, and these SRFs
are not likely to be the same.
Now, analogously to the PRF that underlies the population regression line, we can develop the
concept of the sample regression function (SRF) to represent the sample regression line. The
sample counterpart of Eq.(2.2.2) may be written as
Y^ i = ^β 1 + ^β
2Xi ------------------------------------------------------------------------------
(2.2.5)
where Y ^ i is read as “Y-hat’’ or “Y-cap’’
Y^ i = estimator of E(Y | Xi)
^β 1 = estimator of β1
^β 2 = estimator of β2
Note that an estimator, also known as a (sample) statistic, is simply a rule or formula or method
that tells how to estimate the population parameter from the information provided by the sample
at hand. A particular numerical value obtained by the estimator in an application is known as an
estimate.
Now just as we expressed the PRF in two equivalent forms, (2.2.2) and (2.2.4), we can express
the SRF (2.2.5) in its stochastic form as follows:
Yi = ^β 1 + ^β 2Xi + u^ i -------------------------------------------------------------------------
(2.2.6)
where, in addition to the symbols already defined, u^ i denotes the (sample) residual term.
Conceptually u^ i is analogous to ui and can be regarded as an estimate of ui.
To sum up, our primary objective in regression analysis is to estimate the PRF
Yi = ^β 1 + ^β 2 Xi + u^ i ------------------------------------------------------------------------- (2.2.6)
because more often than not, our analysis is based upon a single sample from some population.
F The deviations of the observations from the line may be attributed to several factors.
(1) Omission of variables from the function
In economic reality each variable is influenced by a very large number of factors.
However, not all the factors influencing a certain variable can be included in the function
for various reasons.
(2) Random behavior of the human beings
The scatter of points around the line may be attributed to an erratic element which is
inherent in human behavior. Human reactions are to a certain extent unpredictable and
may cause deviations from the normal behavioral pattern depicted by the line.
14
(3) Imperfect specification of the mathematical form of the model
We may have linearised a possibly nonlinear relationship. Or we may have left out of the
model some equations.
(4) Errors of aggregation
We often use aggregate data (aggregate consumption, aggregate income), in which we
add magnitudes referring to individuals whose behavior is dissimilar. In this case we say
that variables expressing individual peculiarities are missing.
The first four sources of error render the form of the equation wrong, and they are usually
referred to as error in the equation or error of omission. The fifth source of error is called error of
measurement or error of observation.
In order to take in to account the above sources of error, we introduce in econometric functions a
random variable u called random disturbance term of the function, so called because u is
supposed to disturb the exact linear relationship which is assumed to exist between X and Y.
Of the two interpretations of linearity, linearity in the parameters is relevant for the development
of the regression theory to be presented shortly. Therefore, from now on the term “linear”
regression will always mean a regression that is linear in the parameters; the β’s (that is, the
parameters are raised to the first power only). It may or may not be linear in the explanatory
variables, the X’s.
15
Chapter 3: SIMPLE LINEAR REGRESSION MODELS
There are two major ways of estimating regression functions. These are ordinary least
square (OLS) method and maximum likelihood (MLH) method. Both methods are
basically similar to their application in estimations that you may be aware of in statistics
courses. The ordinary least square method is easiest and most commonly used method as
opposed to maximum likelihood (MLH) method which is limited by its assumptions.
MLH method is valid only for large sample as opposed to OLS method which can be
applied to smaller samples. The Ordinary least square (OLS) method of estimating
parameters or regression function is about finding or estimating values of parameters of
simple linear regression function given below for which errors or residuals are
minimized.
16
Further E(Yi) = β1 + β2Xi gives the relationship between X and Y on
the average, i.e. when X takes on value X i , then Y will on the average
take on E(Yi) (or E(YiXi))
2. The variance of ui is constant for all i, i.e., var(u iXi) = E( u2i Xi) = ❑2 ,
and is called the assumptions of common variance or homoscedasticity.
The implication is that for all values of X, the values of u show the
same dispersion around their mean.
The consequence of this assumption is that var(yiXi) = ❑2
If on the other hand the variance of Y population varies as X changes,
a situation of non-constancy of the variance of Y, called
heteroscedasticity arises.
3. ui has a normal distribution, i.e., ui ∼ N(0, ❑2 ), which also implies
Yi ∼ N(β1 + β2Xi, ❑2 ).
4. The random terms of different observations are independent, cov(u iuj) =E(uiuj)
= 0 for i ≠ j where i and j run from 1 to n. This is called the assumption of no-
autocorrelation (serial) among the error terms.
The consequence of this assumption is that cov(YiYj) =0, for i ≠ j
i.e. no-autocorrelation among the Y’s.
5. Xi’s are a set of fixed values in the process of repeated sampling which
underlies the linear regression model, i.e. they are non-stochastic.
6. u is independent of the explanatory variables, i.e., cov(uiXi) = E(uiXi) = 0.
7. Variability in X values. The X values in a given sample must not all be the
same. Technically, var(X) must be a finite positive number.
8. The regression model is correctly specified.
17
The linear relationship Yi = β1 + β2Xi + ui holds for the population of the values of X and
Y, so that we could obtain the numerical values of β1 and β2 only if we could have all the
possible values of X, Y and u which form the population of these variables. Since this is
impossible in practice, we get a sample of observed values of Y and X, specify the
distribution of the u’s and try to get satisfactory estimates of the true parameters of the
relationship. This is done by fitting a regression line through the observations of the
sample, which we consider as an approximation to the true line.
The method of ordinary least squares is one of the econometric methods which enable us
to find the estimate of the true parameter and is attributed to Carl Friedrich Gauss, a
German mathematician. To understand this method, we first explain the least squares
principle.
However, as noted in Chapter 2, the PRF is not directly observable. We estimate it from
the SRF:
Yi = ^β 1 + ^β 2Xi + u^ i ----------------------------------------------------------------
(2.2.6)
= Y^ i + u^ i --------------------------------------------------------------------------- (*)
Where Y^ i is the estimated (conditional mean) value of Yi
But how is the SRF itself determined? To see this, let us proceed as follows. First, express
(*) as
u^ i = Yi − Y^ i
18
Now given n pairs of observations on Y and X, we would determine the SRF in such a
manner that it is as close as possible to the actual Y. To this end, we adopt the least-
squares criterion, which states that the SRF can be fixed in such a way that
(¿ Y i−Y^ i )
∑ u^ 2i = ∑¿
2
(¿ Y i− β^ 1− β^ 2 X i) 2
= ---------------------------------------------------- (3.2.2)
∑¿
is as small as possible, where u^ i
2
are the squared residuals.
It is obvious from (3.2.2) that ∑ u^ 2i = f ( ^β 1 , ^β 2) that is, the sum of the squared
residuals is some function of the estimators ^β 1 and ^β 2. For any given set of data,
choosing different values for ^β 1 and ^β 2 will give different u^ ’s and hence
different values of ∑ u^ 2i .
The principle or the method of least squares chooses ^β 1 and ^β 2 in such a manner
that, for a given sample or set of data, ∑ u^ 2i is as small as possible. In other words, for
a given sample, the method of least squares provides us with unique estimates of β1 and
β2 that give the smallest possible value of ∑ u^ 2i .
The process of differentiation yields the following equations for estimating β1 and β2.
Differentiating Eq. (3.2.2) partially with respect to ^β 1 and ^β 2, we obtain
2
∂( ∑ u^ i ) (¿ Y i− β^ 1− β^ 2 X i)
= -2 =0
∂ ^β1 ∑¿
∂( ∑ u^ 2i ) (¿ Y i− β^ 1− β^ 2 X i) X i
= -2 =0
∂ ^β2 ∑¿
19
∑ X iY i = ^β 1 ∑ X i + ^β 2 ∑ X 2i ---------------------------------------------
(3.2.4)
Where n is the sample size. These simultaneous equations are known as the normal
equations.
Solving the normal equations simultaneously, we obtain
Σ X 2i ∑ Y i−∑ X i ∑ X i Y i
n ∑ Y i X i −∑ X i ∑ Y i ^β =
^β = and 1 2
2
2 2 n Σ X 2i −( ∑ X i)
n Σ X −( ∑ X i)
i
∑ ( X i – X ) (Y i−Y )
= ---- ^β = Ý - ^β X́ ---------------
∑ ( X i – X )2 (3.2.6)
1 2
(3.2.5)
∑ xi yi
=
∑ x 2i
Where X and Y are the sample means of X and Y and where we define xi= (Xi−
X ) and yi = (Yi − Y ). The above lowercase letters in the formula denote deviations
from mean values. Equation (3.2.6) can be obtained directly from (3.2.3) by simply
dividing both sides of the equation by n.
Note that, by making use of simple algebraic identities, formula (3.2.5) for estimating β 2
can be alternatively expressed as
^β =
∑ xi yi =
∑ xi Y i =
∑ Xi yi
2
∑ x 2i ∑ X 2i −n X́ 2 ∑ X 2i −n X́ 2
----------------------------------------- (3.2.7)
The estimators obtained previously are known as the least-squares estimators, for they
are derived from the least-squares principle. Write finally regression line equation as
Y^ i = ^β 1 + ^β 2Xi.
F Interpretation of estimates
Estimated intercept, ^β 1: The estimated average value of the dependent
variable when the independent variable takes on the value zero
Estimated slope, ^β 2: The estimated change in the average value of the
dependent variable when the independent variable increases by one unit.
20
Y^ i gives average relationship between Y and X. i.e. Y^ i is average
value of Y given Xi.
Example 1
A random sample of ten families had the following income and food expenditure (in $ per
week)
Families A B C D E F G H I J
Family income 20 30 33 40 15 13 26 38 35 43
Family 7 9 8 11 5 4 8 10 9 10
expenditure
Estimate the regression line of food expenditure on income and interpret your results.
Note the following numerical properties of estimators obtained by the method of OLS.
1. The OLS estimators are expressed solely in terms of the observable (i.e., sample)
quantities (i.e., X and Y). Therefore, they can be easily computed.
2. They are point estimators.
3. Once the OLS estimates are obtained from the sample data, the sample regression line
can be easily obtained. The regression line thus obtained has the following properties:
3.1. It passes through the sample means of Y and X.
3.2. The mean value of estimated Y = Y^ i is equal to the actual value of Y. i.e. Y^´
= Ý
3.3. The mean value of the residuals u^ i is zero.
3.4. As a result of the preceding property, the sample regression Yi = ^β 1 + ^β 2Xi
+ u^ i can be expressed in an alternative form where both Y and X are expressed
as deviations from their mean values. i.e. yi = ^β 2 x i + u^ i . The SRF can also
was Y^ i = ^β 1 + ^β 2Xi. The above equations are called the deviation form.
3.5. The residuals u^ i are uncorrelated with the predicted Yi.
3.6. The residuals u^ i are uncorrelated with Xi.
3.3 Precision or Standard Errors of Least-Squares Estimates
21
It is evident that least-squares estimates are a function of the sample data. But since the
data are likely to change from sample to sample, the estimates will change ipso facto.
Therefore, what is needed is some measure of “reliability” or precision of the estimators
^β 1 and ^β 2. In statistics the precision of an estimate is measured by its standard
error (se). The standard errors of the OLS estimates can be obtained as follows:
σ2
var( ^β 2 )= ---------------------------------------------------------
∑ x 2i
(3.3.1)
σ2
se( ^β 2) = 2 ---------------------------------------------- (3.3.2)
√∑ x i
2
∑ Xi 2
var( ^β 1) = 2
σ ---------------------------------------------------
n∑ xi
∑ X 2i σ 2
se( ^β 1) =
√ n ∑ xi
2
----------------------------------------------(3.3.4)
Where var = variance and se = standard error and where σ2 is constant or homoscedastic
variance of ui of Assumption 2.
All the quantities entering into the preceding equations except σ2 can be estimated from
the data. σ2 itself is estimated by the following formula:
2 ∑ u^ 2i
σ^ =
n−2
2
Where σ^ is the OLS estimator of the true but unknown σ2 and where the expression
n −2 is known as the number of degrees of freedom (df), ∑ u^ 2i being the sum of the
residuals squared or the residual sum of squares (RSS).1
22
Compared with Eq. (3.2.2), Eq. (3.3.5.) is easy to use, for it does not require computing
u^ i for each observation.
∑ u^ 2i
σ^ =
√ n−2
-------------------------------------------------------------------- (3.3.7)
is known as the standard error of estimate or the standard error of the regression
(se). It is simply the standard deviation of the Y values about the estimated regression line
and is often used as a summary measure of the “goodness of fit” of the estimated
regression line.
Note the following features of the variances (and therefore the standard errors) of ^β 1
and ^β 2 .
∑ x 2i .
2 The variance of ^β 1 is directly proportional to σ
2
and ∑ X 2i but inversely
A Numerical Example
We illustrate the econometric theory developed so far by considering the Keynesian
consumption function discussed in the Introduction. As a test of the Keynesian
23
consumption function, we use the sample data of Table 2.2a, which for convenience is
reproduced as Table 3.2.
Table 3.2: hypothetical data on weekly family consumption expenditure Y and weekly
family income X
Y($) X($)
70 80
65 100
90 120
95 140
110 160
115 180
120 200
140 220
155 240
150 260
24
15 26 3900 6760 810 351 156.81
90 39 -6.8181
0 0 0 0 0 0 81
1109.9
Su 11 17 2055 3220 330 168 995
0 0 0
m 10 00 00 00 00 00 ≈
1110.0
Me 11 17
Nc Nc 0 0 Nc Nc 111 0
an 1 0
∑ xi yi = Ý - ^β 2 X́ ^β 1
2=
^β
∑ x 2i = 111 − 0.5091(170)
Notes: ≈ symbolizes “approximately
= 16,800/33,000 = 24.4545
= 0.5091 equal to”; nc means “not computed.”
The raw data required to obtain the estimates of the regression coefficients, their standard
errors, etc., are given in Table 3.3. From these raw data, the following calculations are
obtained.
25
3.4 Properties of Least-Squares Estimators:
Given the assumptions of the classical linear regression model, the least-squares
estimates possess some ideal or optimum properties. These properties are contained in the
well-known Gauss–Markov theorem. An estimator, say the OLS estimators ^β 2 , is said
to be a best linear unbiased estimator (BLUE) of β2 if the following hold:
1 It is linear, that is, a linear function of a random variable, such as the dependent
variable Y in the regression model.
2 It is unbiased, that is, its average or expected value, E( ^β 2), is equal to the true
value, β2.
3 It has minimum variance in the class of all such linear unbiased estimators; an
unbiased estimator with the least variance is known as an efficient estimator.
In the regression context it can be proved that the OLS estimators ( ^β 1, ^β 2) are
BLUE.
In this model the intercept term is absent or zero, hence the name regression through the
origin. How do we estimate models like (3.5.1)? To answer these questions, let us first
write the SRF of (3.5.1), namely, Yi = ^β 2Xi + u^ i ------------------------------------------
(3.5.2)
Now applying the OLS method to (3.5.2), we obtain the following formulas for ^β 2
The differences between the two sets of formulas should be obvious: In the model with
the intercept term absent, we use raw sums of squares and cross products but in the
26
intercept present model, we use adjusted (from mean) sums of squares and cross
2
products. Second, the df for computing σ^ is (n−1) in the model without intercept and
(n−2) in the model with intercept.
After estimation of the parameters, the next stage is to establish the criteria for judging
the goodness of the parameter estimates. As indicated in chapter one, we divide the
available criteria in to three groups: theoretical a prior criteria, statistical criteria and
econometric criteria. The theoretical criteria (sign and size of the coefficients) are set by
economic criteria and defined in the stage of the specification of the model. In this
chapter, we develop the statistical criteria for the evaluation of the parameter estimates.
The two most commonly used tests in econometrics are the following:
1. The square of the correlation coefficient, r2, which is used to judge the
explanatory power of the linear regression of Y on X.
2. The standard error of the parameter estimated and is applied for judging the
statistical reliability of the estimates of the regression coefficients ^β 1 and ^β
27
To compute this r2, we proceed as follows: Recall that
Yi = Y^ i + u^ i
---------------------------------------------------------------------------------------- (3.6.1)
Squaring (3.6.1) on both sides and summing over the sample, we obtain
∑ y 2i =∑ ^y 2i +∑ u^ 2i + 2 ∑ ^y i u^ i
∑ y 2i =∑ ^y 2i +∑ u^ 2i -------------------------------------------------- (3.6.2)
∑ y 2i = ^β22 ∑ x 2i +∑ u^ 2i
Since ∑ ^y i u^ i = 0 (why?) and ^y i = ^β 2 x i
28
and shows that the total variation in the observed Y values about their mean value can be
partitioned into two parts, one attributable to the regression line and the other to random
forces because not all actual Y observations lie on the fitted line. Geometrically,
Y
Yi u^ i = due to
residual
SRF
^β +
Y i−Ý = Y^ i 1
total
Y^ i−Ý i = due to
regression
Ý
X
0 Xi
We now define r2 as
Y^
r2 = ∑ (¿¿ i−Ý i )2 =
ESS -------------------------------------------------------- (3.6.5)
∑ (Y i−Ý ) 2
TSS
¿
or, alternatively, as
2
r =1-
∑ u^ 2i
---------------------------------------------------------------------------------
∑ (Y i−Ý )2 (3.6.6)
RSS
= 1 - TSS
29
r2 can be computed more quickly from the following formula:
2 2 2
ESS ∑ ^y i ^β 2 ∑ x i ^ 2 ∑ x 2i
2
r = =
TSS ∑ y 2i
=
∑ y 2i
=β2
( )
∑ y 2i
------------------------------------------- (3.6.7)
If we divide the numerator and denominator of (3.7.7) by the sample size n (or n−1 if the
sample size is small), we obtain
2
^β 22 S X
r = 2
( )S 2Y
------------------------------------------------------------------------- (3.7.8)
n ∑ Y i X i−( ∑ X i )( ∑ Y i )
r=
√[ n∑ X 2i −(∑ X i )2][ n ∑ Y 2i −(∑ Y i )2 ]
30
Some of the properties of r are as follows:
1. It can be positive or negative, the sign depending on the sign of the term in the
numerator
2. It lies between the limits of −1 and +1; that is, −1 ≤ r ≤ 1.
3. It is symmetrical in nature; that is, the coefficient of correlation between X and Y
(rXY) is the same as that between Y and X (rYX).
4. It is a measure of linear association or linear dependence only; it has no meaning
for describing nonlinear relations like Y = X2
Example: 1 Find the value of r2 for the numerical example and interpret it?
31
that the sample is coming from the population whose parameter is not significantly
different from zero while the alternative hypothesis addresses that the sample is coming
from the population whose parameter is significantly different from zero. The two
hypotheses are given as follows:
H0: βi=0
H1: βi≠0
The standard error test is outlined as follows:
1. Compute the standard deviations of the parameter estimates using the above formula
for variances of parameter estimates. This is because standard deviation is the positive
square root of the variance.
∑ X 2i σ 2
se( ^β 1 )=
√ n ∑ x 2i
σ2
se( ^β )=
2 2
√∑ x i
2. Compare the standard errors of the estimates with the numerical values of the estimates
and make decision.
A) If the standard error of the estimate is less than half of the numerical value of the
estimate, we can conclude that the estimate is statistically significant. That is, if
^ 1
P=+ , reject the null hypothesis and we can conclude that the estimate is
statistically significant.
B) If the standard error of the estimate is greater than half of the numerical value of the
32
The standard deviation of the population is known irrespective of the sample size
The standard deviation of the population is unknown provided that the sample
size is sufficiently large (n>30).
The standard normal test or Z-test is outline as follows;
^
β
If 0 , accept the null hypothesis while if S T =S E+ S R , reject the null
hypothesis. It is true that most of the times the null and alternative hypotheses are
mutually exclusive. Accepting the null hypothesis means that rejecting the
alternative hypothesis and rejecting the null hypothesis means accepting the
alternative hypothesis.
∑e e
P^ = t t−1
Example: If the regression has a value of =29.48 and the standard error of ∑e 2
t−1 is
Y t=P^ Y t−1=β0 (1−P^ )+β1 [ X1t−ϑ^ X1t−1]+Ut∗¿
36. Test the hypothesis that the value of ¿ at 5% level of significance using
standard normal test.
Solution: We have to follow the procedures of the test.
^ ^
et=Y t− β^ 0− β^ 1 X 1t
33
After setting up the hypotheses to be tested, the next step is to determine the level of
significance in which the test is carried out. In the above example the significance level is
given as 5%.
The third step is to find the theoretical value of Z at specified level of significance. From
^
^=
ϑ
∑ e t e t −1
the standard normal table we can get that ∑ et − 1 2 .
The fourth step in hypothesis testing is computing the observed or calculated value of the
standard normal distribution using the following formula.
^^ ^^ x
Y t − PY t−1=β 0 (1− P)+β 1 ( X t − ϑX t−1 )+U t ∗ ¿
¿ . Since the calculated value of the test statistic is
less than the tabulated value, the decision is to accept the null hypothesis and conclude
that the value of the parameter is 25.
C) The Student t-Test
In conditions where Z-test is not applied (in small samples), t-test can be used to test the
statistical reliability of the parameter estimates. The test depends on the degrees of
freedom that the sample has. The test procedures of t-test are similar with that of the z-
test. The procedures are outlined as follows;
1. Set up the hypothesis. The hypotheses for testing a given regression coefficient is
given by:
^ ^ ^
Yt−PY^ t−1=β0(1−P)^ +β1( Xt−ϑX^ t−1)+Ut∗x¿
¿
2. Determine the level of significance for carrying out the test. We usually use a 5% level
significance in applied econometric research.
3. Determine the tabulated value of t from the table with n-k degrees of freedom, where k
is the number of parameters estimated.
4. Determine the calculated value of t. The test statistic (using the t- test) is given by:
Ut=ϑU t−1+V t
The test rule or decision is given as follows:
34
(Yt−PY t−1)=β0(1−P)+β1(X1t−PX1t−1)+. .+ βK (XKt PXt−1)+Ut∗¿
Reject H0 if ¿
We have discussed the important tests that that can be conducted to check model and
parameters validity. But one thing that must be clear is that rejecting the null hypothesis
does not mean that the parameter estimates are correct estimates of the true population
parameters. It means that the estimate comes from the sample drawn from the population
whose population parameter is significantly different from zero. In order to define the
range within which the true parameter lies, we must construct a confidence interval for
the parameter. Like we constructed confidence interval estimates for a given population
mean, using the sample mean (in Introduction to Statistics), we can construct 100(1- )
% confidence intervals for the sample regression coefficients. To do so we need to have
the standard errors of the sample regression coefficients. The standard error of a given
coefficient is the positive square root of the variance of the coefficient. Thus, we have
discussed that the formulae for finding the variances of the regression coefficients are
given as.
β 0 ( 1− P )= a0
β 1= a1
β 1 ϑ =a2
Variance of the intercept is given by etc . 3.7.12
Y =a +PY +a X +. +a X +V
Variance of the slope t 0 t−1 1 1t K Kt t is given by
Y t −1
3.7.13
Where, δ 2=
∑ u2 (3.7.13) is the estimate of the variance of the random term and k
n−k
is the number of parameters to be estimated in the model. The standard errors are the
positive square root of the variances and the 100(1- ) % confidence interval for the
slope is given by:
( Y t − PY t − 1 )= Y t ∗¿
¿ ( X 1 t − ϑX 1 t −1 )= X 1 t∗
. ..
( X Kt − PX Kt − 1 )= X Kt ∗
35
Y t∗¿β 0+β1 X 1t∗+...+β K X Kt∗+V t 3.7.14
And for the intercept:
Γx i x j=1
Example 2.6: The following table gives the quantity supplied (Y in tons) and its price (X
pound per ton) for a commodity over a period of twelve years.
Table 3: Data on supply and price for given commodity
Yi 69 76 52 56 57 77 58 55 67 53 72 64
Xi 9 12 6 10 9 10 7 8 12 6 11 8
Tim xi
e Yi Xi XiYi Xi2 Yi2 xi yi xiyi 2 yi2 Γxix j=0 ui ui2
1 69 9 621 81 4761 0 6 0 0 36 63.00 6.00 36.00
16
2 76 12 912 144 5776 3 13 39 9 9 72.75 3.25 10.56
- - 12
3 52 6 312 36 2704 3 11 33 9 1 53.25 -1.25 1.56
- 105.0
4 56 10 560 100 3136 1 -7 -7 1 49 66.25 10.25 6
5 57 9 513 81 3249 0 -6 0 0 36 63.00 -6.00 36.00
19 115.5
6 77 10 770 100 5929 1 14 14 1 6 66.25 10.75 6
-
7 58 7 406 49 3364 2 -5 10 4 25 56.50 1.50 2.25
-
8 55 8 440 64 3025 1 -8 8 1 64 59.75 -4.75 22.56
9 67 12 804 144 4489 3 4 12 9 16 72.75 -5.75 33.06
- - 10
10 53 6 318 36 2809 3 10 30 9 0 53.25 -0.25 0.06
11 72 11 792 121 5184 2 9 18 4 81 69.50 2.50 6.25
-
12 64 8 512 64 4096 1 1 -1 1 1 59.75 4.25 18.06
36
75 10 696 102 4852 15 4 89 756.0 387.0
Sum 6 8 0 0 2 0 0 6 8 4 0 0.00 0
Mea
n 63 9 63
2
r =1-
∑ u 2i 387
=1- 894 =1−0.43=0.57
∑ y 2i
This result shows that 57% of the variation in the quantity supplied of the commodity
under consideration is explained by the variation in the price of the commodity; and the
rest 37% remain unexplained by the price of the commodity. In other word, there may be
other important explanatory variables left out that could contribute to the variation in the
quantity supplied of the commodity, under consideration.
2. Run significance test of regression coefficients using the following test methods
37
A. Standard Error test
In testing the statistical significance of the estimates using standard error test, the
following information needed for decision.
Since there are two parameter estimates in the model, we have to test them separately.
Testing for 1
2 2
We have the following information about
β1+β2=1 i.e.
H0:βi=0 =3.25 and
( Γxi x j ) ≥R
The following are the null and alternative hypotheses to be tested.
ΔX t=Xt−Xt−1
∑ ^y2=135.0262
Since the standard error of is less than half of the value of 1 , we have to reject the
null hypothesis and conclude that the parameter estimate 1 is statistically significant.
Testing for 0
Again we have the following information about 0
0 33.75 and se( 0 ) 8.3
H 0 : 0 0
H 1 : 0 0
X , X ,. . X Y=β+ X+β X +. +β X +u
Since the standard error of 1 2 k is less than half of the numerical value of i 0 1 2 2 k i , we have
38
Yi
The parameters are known.
β i
ui
Further tabulated value for t is 2.228. When we compare these two values, the calculated
t is greater than the tabulated value. Hence, we reject the null hypothesis. Rejecting the
null hypothesis means, concluding that the price of the commodity is significant in
determining the quantity supplied for the commodity.
In this part we have seen how to conduct the statistical reliability test using t-statistic.
Now let us see additional information about this test. When the degree of freedom is
large, we can conduct t-test without consulting the t-table in finding the theoretical value
of t. This rule is known as “2t-rule”. The rule is stated as follows;
The t-table shows that the values of t changes very slowly if the degrees of freedom (n-k)
theoretical value of
Y i is 2.0. Thus, a two tail test of a null hypothesis at 5% level of
significance can be reduced to the following rules.
1. If
X 1 is greater than 2 or less than -2, we reject the null hypothesis
2. If
β 2 is less than 2 or greater than -2, accept the null hypothesis.
3. Fit the linear regression equation and determine the 95% confidence interval for the
slope.
39
Fitted regression model is indicated
Yi , where the numbers in
parenthesis are standard errors of the respective coefficients. To estimate confidence
interval we need standard error which is determined as follows
2
2 ∑ ui 387
δ = = =38.7
n−k 10
δ2 38.7
var ( β^ 1 )= 2
= =0.80625
∑ x 48
The standard error of the slope is
β 0
The tabulated value of t for degrees of freedom 12-2=10 and /2=0.025 is 2.228.
Hence the 95% confidence interval for the slope is given by:
X 1 ∧ X 2 . The result
tells us that at the error probability 0.05, the true value of the slope coefficient lies
between 1.25 and 5.25.
40
extend our simple two-variable regression model to cover models involving more than
two variables. Adding more variables leads us to the discussion of multiple regression
models, that is, models in which the dependent variable, or regressand, Y depends on two
or more explanatory variables, or regressors. The simplest possible multiple regression
model is three-variable regression, with one dependent variable and two explanatory
variables.
In this chapter we shall extend the simple linear regression model to relationships with
two explanatory variables and consequently to relationships with any number of
explanatory variables.
Systematic Random
component component
β1 is the intercept term which gives the average values of Y when X 2 and
X 3 are zero.
β2 and β3 are called the partial slope coefficient, or partial regression
coefficients.
β 2 measures the change in the mean value of Y resulting from a unit change in
value of Y( net of any effect that X 3 may have on the mean of Y). The
interpretation of β 3 is also similar.
To complete the specification of our simple model we need some assumptions about the
random variable u. These assumptions are the same as in the single explanatory variable
model developed in chapter 3. That is:
Zero mean value of ui , or E( ui | X 2 i , X 3 i ) = 0 for each i
41
No serial correlation, or cov( ui , u j ) = 0 where i ≠ j
Homoscedasticity, or var( ui ) = σ 2
Normality of ui i.e ui ∼ N(0, σ 2 )
Zero covariance between ui and each X variable, or cov( ui , X 2 i ) = cov(
ui , X 3 i ) = 0
No specification bias, or the model is correctly specified
No exact collinearity between the X variables, or no exact linear relationship
between X 2 and X3
For notational symmetry, Eq. (5.1.1) can also be written as
Y i=β 1 X 1i + β 2 X 2 i+ β3 X 3 i +ui with the provision that X 1 i = 1 for all i.
The assumption of no collinearity is a new one and means the absence of possibility of
one of the explanatory variables being expressed as a linear combination of the other.
Existence of exact linear dependence between X 2 i and X 3 i would mean that we
have only one independent variable in our model than two. If such a regression is
β
estimated there is no way to estimate the separate influence of (¿¿ 2) and X 3 (β 3 )
X2¿
on Y, since such a regression gives us only the combined influence of X 2 and X 3 on
Y.
To see this suppose X 3=2 X 2 then
Y i=β 1 +β 2 X 2 i + β 3 X 3 i+u i
Y i=β 1 + β 2 X 2 i + β 3 ( 2 X 2i ) +u i
Y i=β 1 + ( β 2+ 2 β 3 ) X 2 i +ui
Y i=β 1 +α X 2i +ui , where α = ( β 2+ 2 β 3 )
Estimating the above regression yields the combined effect of X2 and X3 as
This assumption does not guarantee there will not be correlations among the explanatory
variables; it only means that the correlations are not exact or perfect, as it is not
42
impossible to find two or more (economic) variables that may not be correlated to some
extent. Likewise the assumption does not guarantee absence of non-linear relationships
among X’s either.
Having specified our model we next use sample observations on Y, X2 and X3 and
obtain estimates of the true parameters β 1 , β 2 and β 3 :
Y^ i= β^ 1 + ^β 2 X 2 i + ^β 3 X 3 i
where ^β 1 , ^β 2 , ^β 3 are estimates of the true parameters β1 , β 2 and β 3 of
the relationship.
As before, the estimates will be obtained by minimizing the sum of squared residuals
2
∑ u^ 2i =∑ (Y i −Y^ i)2=∑ (Y i−( ^β 1+ ^β2 X 2 i + ^β 3 X 3 i ))
A necessary condition for this expression to assume a minimum value is that its partial
derivatives with respect to ^β 1 , ^β 2 , and ^β 3 be equal to zero:
2
∂ ∑ (Y i−( ^β 1+ β^ 2 X 2 i+ ^β3 X 3 i ) )
∂ β^ 1
2
∂ ∑ (Y i−( ^β 1+ β^ 2 X 2 i+ ^β3 X 3 i ) )
∂ β^ 2
2
∂ ∑ (Y i−( ^β 1+ β^ 2 X 2 i+ ^β3 X 3 i ) )
∂ β^ 3
Performing the partial differentiations we get the following system of three normal
equations in three unknown parameters ^β 1 , ^β 2 , and ^β 3
^β 1=Ý − β^ 2 X́ 2− β^ 3 X́ 3
x2 i y i
∑¿ -------------------------------------
¿ (4.1.4)
2
(∑ 3 i ) ∑ 3 i y i )(∑ x 2 i x 3 i)
x −( x
¿
^β 2=¿
x3 i yi
∑¿
¿
2
(∑ x 2 i )−(∑ x 2 i y i )(∑ x 2 i x 3 i)
¿
^β 3=¿
2∑ ^y 2 ∑ ( Y^ − Ý )2 ∑ 2
u^ i
R= = =1−
∑ y 2 ∑ (Y − Ý )2 ∑ y2
44
^β 2 ∑ y i x2 i + ^β 3 ∑ y i x 3 i
R 2= −−−−−−−−−−−−(4.1 .5)
∑ y 2i
The value of R2 lies between 0 and 1. The higher R 2 the greater the percentage of the
variation of Y explained by the regression plane, that is, the better the ‘goodness of fit’ of
the regression plane to the sample observations. The closer R2 to zero, the worse the fit.
4.1.3 The mean and variance of the parameter estimates ^β 1 , ^β 2 , and ^β 3
The mean of the estimates of the parameters in the three-variable model is derived in the
same way as in the two-variable model. The estimates ^β 1 , ^β 2 , and ^β 3 are
unbiased estimates of the true parameters of the relationship between Y, X2 and X3: their
mean expected value is the true parameter itself.
E ( ^β 1 )= ^β1 E ( ^β 2 )= ^β2 E ( ^β 3 ) = ^β3
The variance of the parameter estimates are obtained by the following formulae
2 2 2 2
1 X́ ∑ x 3 i+ X́ 3 ∑ x2 i−2 X́ 2 X́ 3 ∑ x 2i x 3 i
[
var ( ^β1 ) =σ^ 2 + 2
n (∑ x 2 )(∑ x 2 )−(∑ x x )
2i 3i
2
2i 3i
]
var ( ^β2 ) =σ^ 2
∑ x 23 i
2
( ∑ x 22 i)( ∑ x 23 i)−(∑ x 2i x 3 i)
45
There are K parameters to be estimated (K = �+1). Clearly the system of normal
equations will consist of K equations, in which the unknowns are the parameters ^β 1 ,
^β 2 , ^β 3 , …, ^β K , and the known terms will be the sums of squares and the sums
In order to derive the K normal equations without the formal differentiation procedure,
we start from the equation of the estimated relationship
Y i= β^ 1 + ^β 2 X 2 i +…+ β^ K X Ki + u^ i
and we make use of the assumptions
∑ u^ i =0 and ∑ ui X j=0 where (j = 1, 2, 3, …, K)
The normal equations for a model with any number of explanatory variables may be
derived in a mechanical way, without recourse to differentiation. We will introduce a
practical rule of thumb, derived by inspection of the normal equations of the two-variable
and the three-variable models. We begin by rewriting these normal equations.
1. Model with one explanatory variables
Structural form Y i=β 1 + β 2 X 2 i +ui
Estimated form Y i= β^ 1 + ^β 2 X 2 i + u^ i
∑ Y i=n ^β1 + ^β 2 ∑ X 2 i
Normal equations ∑ X 2i Y i= β^ 1 ∑ X 2 i+ ^β2 ∑ X 22 i
46
Y i= β^ 1 + ^β 2 X 2 i + ^β 3 X 3 i+ …+ ^β K X Ki + u^ i
The generalization of the linear regression model with the variables expressed in
deviations from their means is the same. Thus the estimated form of the K-variable model
in deviation form is
y i= ^β 2 x 2 i + ^β 3 x 3 i+ …+ ^β K x Ki + u^ i
The Kth equation is derived by multiplying through the estimated form by x Ki and
summing over all the sample observations
∑ y i x Ki = ^β 2 ∑ x 2 i x Ki + ^β3 ∑ x 3 i x Ki +…+ ^β K ∑ x 2Ki
4.2.2 Generalization formula for R2
The generalization formula of the coefficient of multiple determination may be derived
by inspection of the formulae of R2 for the two-variable and three-variable models.
1. Model with one explanatory variable
β^ 2 ∑ y i x 2 i
R2Y . X =
2
∑ y 2i
2. Model with two explanatory variables
^β2 ∑ y i x2 i + ^β 3 ∑ y i x 3i
R2Y . X X =
2 3
∑ y 2i
By inspection we see that for each additional explanatory variable the formula of the
squared multiple correlation coefficient includes an additional term in the numerator,
formed by the estimate of the parameter corresponding to the new variable multiplied by
the sum of products of the deviations of the new variable and the dependent one. For
example, the formula of the coefficient of multiple determination for the K-variable
model is
47
^β 2 ∑ y i x 2i + β^ 3 ∑ y i x 3 i +…+ ^β K ∑ y i x K
R2Y . X … X =
2 K
∑ y 2i
2
4.2.3 The adjusted coefficient of determination: Ŕ
The inclusion of additional explanatory variables in the function can never reduce the
coefficients of multiple determination and will usually raise it. By introducing a new
repressor we increase the value of the numerator of the expression for R 2, while the
denominator remain the same ( ∑ y 2i the total variations of Yi is given in any
particular sample).
To correct for this defect we adjust R2 by taking into account the degrees of freedom,
which clearly decrease as new repressors are introduced in the function. The expression
for the adjusted coefficient of multiple determination is
n−1
Ŕ2=1−(1−R 2)
n−K
Or
∑ u2i /(n−K )
2
Ŕ =1−
[ ∑ y 2i /(n−1) ]
Where R2 is the unadjusted multiple correlation coefficient, n is the number of sample
observations and K is the number of parameters estimated from the sample.
If n is large Ŕ2 and R2 will not differ much. But with small samples, if the number of
repressors (X’s) is large in relation to the sample observations, Ŕ2 will be much
2
smaller than R2 and can even assume negative values, in which case Ŕ should be
interpreted as being equal to zero.
48
2. Model with two explanatory variables
x2 i x3 i
∑¿
¿
¿
( ∑ x 22 i )(∑ x 23 i )−¿
2
^ 2 ∑ x3 i
var ( β2 ) =σ ¿
x2 i x3 i
∑¿
¿
¿
( ∑ x 2 i )(∑ x 23 i )−¿
2
2
^ 2 ∑ x2 i
var ( β3 ) =σ ¿
The above expressions may be written in the form of determinants as follows. The normal
equations of the model with two explanatory variables, written in deviation form, are
x2 i x3 i
∑¿
( ∑ x 2 i y i ) = ^β 2 (∑ x 22i )+ ^β3 ¿
x2 i x3 i
∑¿
¿
( ∑ x 3 i yi ) = ^β 2 ¿
The terms in the parentheses are the ‘knows’ which are computed from the sample
observations, while ^β 2 and ^β 3 are the unknowns. The known terms appearing on
the right-hand side may be written in the form of a determinant
∑ x22 i ∑ x 2 i x 3 i = A
∑ x 2 i x 3 i ∑ x23 i
The variance of each parameter is the product of σ2 multiplied by the ratio of the
minor determinant1 associated with this parameters divided by the (complete)
determinant.
49
Thus
x2 i x3 i
∑¿
¿
¿
(∑ x 22 i )(∑ x 23 i ) −¿
∑ x 22 i ∑ x2 i x3 i
^
var ( β2 ) =σ .
x x
2 ∑ 2i 3i ∑ x 23 i
=σ 2
.
∑ x 23 i =σ
2
2 ∑ x3 i
.
¿
∑ x 22 i ∑ x2 i x3 i ∑ x 22 i ∑ x 2i x 3 i
∑ x2 i x3 i ∑ x 23 i
∑ x 2i x 3 i ∑ x 23 i
x2 i x3 i
∑¿
¿
¿
(∑ 2 i )(∑ x 23 i )−¿
x 2
∑ x 22 i ∑ x2 i x3 i
2
va r ( ^β3 ) =σ .
∑ x2 i x3 i ∑ x 23 i =σ 2
.
∑ x 22 i =σ
2
2 ∑ x2 i
.
¿
∑ x 22 i ∑ x2 i x3 i ∑ x 22 i ∑ x 2i x 3 i
∑ x2 i x3 i ∑ x 23 i
∑ x 2i x 3 i ∑ x 23 i
Examing the above expressions of the variances of the coefficient estimates we may
generalize as follows. The variances of the estimates of the model including �-
explanatory variables can be computed by the ratio of two determinants: the determinant
appearing in the numerator is the minor formed after striking out the row and column of
the terms corresponding to the coefficient whose variance is being computed; the
determinant appearing in the denominator is the complete determint of the known terms
appearing on the rihgt-hand side of the normal equations. For example the variance of
^β K is given by the following expression.
50
∑ x22 i ∑ x2 i x3 i … ∑ x 2i x Ki
var ( ^β K ) =σ 2 .
∑ x 2 i x 3 i ∑ x 23 i … ∑ x3 i x Ki
⋮
∑ x2 i x Ki
⋮
∑ x 3 i x Ki
∑ x22 i ∑ x2 i x3 i … ∑ x 2i x Ki
⋮
∑ x 2Ki
∑ x 2 i x 3 i ∑ x 23 i … ∑ x3 i x Ki
⋮
∑ x2 i x Ki
⋮
∑ x 3 i x Ki
⋮
∑ x 2Ki
Example 1
The table below contains observations on the quantity demanded (Y) of a certain
commodity, its price (X2) and consumers’ income (X3). Fit a linear regression to these
observations and test the overall goodness of fit (with R2) as well as the statistical
reliability of the estimates ^β 1 , ^β 2 , ^β 3 .
Quantity 100 75 80 70 50 65 90 100 110 60
demanded
Price 5 7 6 6 8 7 5 4 3 9
Income 1,000 600 1,200 500 300 400 1,300 1,100 1,300 300
51
given in section 4.2.4. Furthermore, (n−3) σ^ 2 / σ 2 follows the χ2 distribution with
n−3df.
Upon replacing σ 2 by its unbiased estimator σ^ 2 in the computation of the standard
errors, each of the following variable.
β^ 1−β 1
t=
se ( β^ 1 )
β^ 2− β2
t=
se( β^ 2 )
β^ 3− β3
t=
se( ^β ) 3
H 1 : β i ≠0 i = 1, 2… K
^β i
t= ∼ t n− K , K = � + 1 = number of variables. Similarly the 100(1-�) % level of
se( ^β )
i
Example 2
A production function is estimated as
4.0 0.7 X 2 0.2 X 3 R2=0.86
Y^ = + +
(0.78) (0.102) ( 0.102) n=23
Where X 2 = labor, X 3 = capital, and Y = output
Test the hypothesis β 2=0 , β 3=0 at � = 5% using the test of significance and
confidence interval approach
52
Testing the overall significance of the sample regression
Throughout the previous section we were concerned with testing the significance of the
estimated partial regression coefficients individually, that is, under the separate
hypothesis that each true population partial regression coefficient was zero. But now
consider the following hypothesis:
H 0 : β2= β3 =0
This null hypothesis is a joint hypothesis that β2 and β3 are jointly or
simultaneously equal to zero. A test of such a hypothesis is called a test of the overall
significance of the observed or estimated regression line, that is, whether Y is linearly
related to both X2 and X 3 . Can the joint hypothesis given above be tested by
P [ ^β3 −t α /2 se ( β^ 3 )≤ β 3 ≤ ^β 3+ t α /2 se ( ^β 3) ]=1−α
are individually true, it is not true that the probability that the intervals
[ β^ ± t
2 α/2 se ( ^β2 ) , ^β 3 ± t α / 2 se ( ^β3 ) ]
Simultaneously include β2 and β3 is (1−α )2 , because the intervals may not be
independent when the same data are used to derive them. To state the matter differently,
testing a series of single [individual] hypotheses is not equivalent to testing those same
hypotheses jointly. The intuitive reason for this is that in a joint test of several hypotheses
any single hypothesis is “affected’’ by the information in the other hypotheses.
53
The upshot of the preceding argument is that for a given example (sample) only one
confidence interval or only one test of significance can be obtained. How, then, does one
test the simultaneous null hypothesis that β2 = β 3 = 0? The answer follows.
Further the two chi-square distributions are independent and thus under the null
hypothesis H 0 : βi =0
2
χ( K−1) / K−1 ESS / K−1
F= 2
= F (K−1,n− K)
χ (n−K ) /n−K RSS / n−K
What use can be made of the preceding F ratio? Let us take the two variable case
^β 22 ∑ x 2i
F=
∑ u^ 2i /(n−2)
54
2 2
β^ 2 ∑ xi
F= 2
σ^
It can be shown that
E ( ^β 22 ∑ x 2i ) =σ 2 + β 22 ∑ x2i
and
∑ u^ 2i
E ( )
n−2
=E ( σ^ 2 ) =σ 2
Associated with any sum of squares is its df, the number of independent observations on
which it is based. TSS has n-1df because we lose 1 df in computing the sample mean
Ý . RSS has n−K df. (Why?) ESS has � = K-1 df. Mean sum of squares is obtained by
dividing SS by their df.
F We can generalize the F-testing procedure as follows.
55
Given the K-variable regression model:
Y i=β 1 + β 2 X 2 i + β 3 X 3 i+ …+ β K X Ki +u i
To test the hypothesis
H 0 : β2= β3 =…= β K =0
(i.e., all slope coefficients are simultaneously zero) versus
H1: Not all slope coefficients are simultaneously zero
Compute
n−K
RSS /(¿)
ESS / df ESS /( K −1)
F= =
RSS / df ¿
If F > F α (K −1 , n−K ) , reject H0; otherwise you do not reject it, where
F α (K −1 , n−K ) is the critical F value at the α level of significance and (K−1)
numerator df and (n−K) denominator df.
A summary of the F-statistic
Null hypothesis Alternative hypothesis Critical region
H0 H1 Reject H 0 if
2 2 2 2 2
σ =σ
1 2 σ1> σ2 S1
2
> F α, ndf ,ddf
S2
σ 21 =σ 22 σ 21 ≠ σ 22 S 21
> F α / 2, ndf ,ddf
S 22
2
1−α / ¿
¿
Or ¿
2
S1
2
< F¿
S2
Notes:
1. σ 21 and σ 22 are the two population variances.
2. S 21 and S 22 are the two sample variances.
3. ndf and ddf denote, respectively, the numerator and denominator df.
4. In computing the F ratio, put the larger S 2 value in the numerator.
56
5. The critical F values are given in the last column. The first subscript of F is the level of
significance and the second subscript is the numerator and denominator df.
Example 3
With reference to the production function regression in the previous example suppose
you are given with the following intermediary results
57
Where use is made of the definition R2 = ESS/TSS. The above shaded equation shows
how F and R2 are related. These two vary directly. When R 2 = 0, F is also zero. The larger
the R2, the greater the F value. In the limit, when R 2 = 1, F is infinite. Thus the F test,
which is a measure of the overall significance of the estimated regression, is also a test of
significance of R2. In other words, testing the null hypothesis H 0 : β2= β3 =…= β K =0
is equivalent to testing the null hypothesis that (the population) R2 is zero.
One advantage of the F test expressed in terms of R 2 is its ease of computation: All that
one needs to know is the R 2 value. Therefore, the overall F test of significance can be
recast in terms of R2 as shown in the table below
R 2 . TSS/ K−1 R2 / ( K −1 )
F= =
( 1−R2 ) .TSS / ( n−K ) ( 1−R 2 ) / ( n−K )
UNIT 5
Dummy Variable Regression Models
A dummy variable (an indicator variable) is a numeric variable that represents categorical
data, such as gender, race, political affiliation, etc.
Technically, dummy variables are dichotomous, quantitative variables. Their range of
values is small; they can take on only two quantitative values. As a practical matter,
regression results are easiest to interpret when dummy variables are limited to two
specific values, 1 or 0. Typically, 1 represents the presence of a qualitative attribute, and 0
represents the absence.
The number of dummy variables required to represent a particular categorical variable
depends on the number of values that the categorical variable can assume. To represent a
categorical variable that can assume k different values, a researcher would need to define
k - 1 dummy variables.
58
For example, suppose we are interested in political affiliation, a categorical variable that
might assume three values- Republican, Democrat, or Independent. We could represent
political affiliation with two dummy variables:
X1 = 1, if Republican; X1 = 0, otherwise.
X2 = 1, if Democrat; X2 = 0, otherwise.
In this example, notice that we don't have to create a dummy variable to represent the
"Independent" category of political affiliation. If X1 equals zero and X2 equals zero, we
know the voter is neither Republican nor Democrat. Therefore, voter must be
Independent.
How to Interpret Dummy Variables
Once a categorical variable has been recoded as a dummy variable, the dummy variable
can be used in regression analysis just like any other quantitative variable.
For example, suppose we wanted to assess the relationship between household income
and political affiliation (i.e., Republican, Democrat, or Independent). The regression
equation might be: Income = b0 + b1X1+ b2X2
Where b0, b1, and b2 are regression coefficients. X1 and X2 are regression coefficients
defined as:
X1 = 1, if Republican; X1 = 0, otherwise.
X2 = 1, if Democrat; X2 = 0, otherwise.
The value of the categorical variable that is not represented explicitly by a dummy
variable is called reference group. In this example, the reference group consists of
Independent voters.
In analysis, each dummy variable is compared with the reference group. In this example,
a positive regression coefficient means that income is higher for the dummy variable
political affiliation than for the reference group; a negative regression coefficient means
that income is lower. If the regression coefficient is statistically significant, the income
discrepancy with the reference group is also statistically significant.
Dummy Variable Recoding
Exampled the first thing we need to do to express gender as one or more dummy
variables. How many dummy variables will we need to fully capture all of the
information inherent in the categorical variable Gender? To answer that question, we look
59
at the number of values (k) Gender can assume. We will need k - 1 dummy variables to
represent Gender. Since Gender can assume two values (male or female), we will only
need one dummy variable to represent gender.
Therefore, we can express the categorical variable Gender as a single dummy variable
(X1), like so:
X1 = 1 for male students.
X1 = 0 for non-male students.
Note that X1 identifies male students explicitly. Non-male students are the reference
group. This was an arbitrary choice. The analysis works just as well if you use X 1 to
identify female students and make non-female students the reference group.
60
categories, we need only m-1 dummy variables. If we are suppressing the intercept, we
can have m dummies but the interpretation will be a bit different.
The intercept value represents the mean value of the dependent variable for the bench
mark category. This is the category for which we do not assign a dummy (in our case,
West is a bench mark category). The coefficients of the dummy variable are called
differential intercept coefficients because they tell us by how much the value of the
intercept that receives the value of 1 differs from the intercept coefficient of the
benchmark category.
CHAPTER 6
6.1. MULTICOLLINREARITY
Today, however, the term multicollinearity is used in a broader sense to include the case
of perfect multicollinearity, as shown by (6.1.1), as well as the case where the X variables
are intercorrelated but not perfectly so, as follows:
λ1 X 1 + λ2 X 2 +…+ λ K X K + v i=0 …………………..………… 6.1.2
where v i is a stochastic error term.
61
To see the difference between perfect and less than perfect multicollinearity, assume, for
example, that
λ2 ≠ 0. Then, (6.1.1) can be written as
−λ1 λ λ
X 2 i= X 1 i− 3 X 3 i −…− K X Ki …………………………… 6.1.3
λ2 λ2 λ2
which shows how X2 is exactly linearly related to other variables or how it can be
derived from a linear combination of other X variables. In this situation, the coefficient of
correlation between the variable X2 and the linear combination on the right side of
(6.1.3) is bound to be unity.
Why does the classical linear regression model assume that there is no multicollinearity
among the X’s? The reasoning is this: If multicollinearity is perfect in the sense of
(6.1.1), the regression coefficients of the X variables are indeterminate and their
standard errors are infinite. If multicollinearity is less than perfect, as in (6.1.2), the
regression coefficients, although determinate, possess large standard errors (in
relation to the coefficients themselves), which means the coefficients cannot be
estimated with great precision or accuracy. The proofs of these statements are given as
follows.
62
are expressed as deviations from their sample means, we can write the three-variable
regression model as
y i= ^β 2 x 2 i + ^β 3 x 3 i+ u^ i ………………………………………… 6.1.5
x3i yi
∑¿
¿
2
(∑ x 2 i )−(∑ x 2 i y i )(∑ x 2 i x 3 i)
¿
^β =¿
3
Assume that X3i = λ X2i , where λ is a nonzero constant. Substituting this into the
yi x2i
∑¿
¿ 0
¿ ………………………….…. 6.1.6
(∑ λ x 2 i )−( λ ∑ y i x 2i )( λ ∑ x 22 i)
2 2
0
¿
^β 2=¿
Why do we obtain the result shown in (6.1.6)? Recall the meaning of ^β 2 : It gives the
63
Estimation in the presence of “high” but “imperfect” multicollinearity
Generally, there is no exact linear relationship among the X variables, especially in data
involving economic time series. Thus, turning to the three-variable model in the deviation
form given in (6.1.5), instead of exact multicollinearity, we may have
x 3i =λ x 2i + v i ……………………………………………………. 6.1.7
where λ ≠ 0 and where vi is a stochastic error term such that ∑ x 2 i v i=0 . (Why?)
y
∑ (¿ 2 i )( λ ∑ x2 i +∑ v i )−(λ ∑ y i x 2 i +∑ y i v i )( λ ∑ x 22i )
¿ i x 2 2 2
2 ………………….6.1.8
∑ x 22 i ( λ2 ∑ x 22i + ∑ v 2i )−(λ ∑ x 22 i)
β^ 2=¿
where use is made of ∑ x 2 i v i=0 . A similar expression can be derived for ^β 3 .
Now, unlike (6.1.6), there is no reason to believe a priori that (6.1.8) cannot be estimated.
Of course, if vi is sufficiently small, say, very close to zero, (6.1.8) will indicate almost
perfect collinearity and we shall be back to the indeterminate case of (6.1.6).
64
wealth, and population, the regressors income, wealth, and population may all be
growing over time at more or less the same rate, leading to collinearity among
these variables.
σ2
var ( ^β2 ) = …………………………….. 6.1.3.1
∑ x22 i (1−r 223)
65
2
σ
var ( ^β3 ) = …………………………….. 6.1.3.2
∑ x3 i (1−r 223)
2
It is apparent from (6.1.3.1) and (6.1.3.2) that as r 23 tends toward 1, that is, as
collinearity increases, the variances of the two estimators increase and in the limit when
r 23 = 1, they are infinite.
The speed with which variances increase can be seen with the variance-inflating factor
(VIF), which is defined as
1
VIF= ………………………………..…….. 6.1.3 3
(1−r 223 )
VIF shows how the variance of an estimator is inflated by the presence of
multicollinearity. As r 223 approaches 1, the VIF approaches infinity. That is, as the
extent of collinearity increases, the variance of an estimator increases, and in the limit it
can become infinite. As can be readily seen, if there is no collinearity between X2 and
X3, VIF will be 1.
Using this definition, we can express (6.1.3.1) and (6.1.3.2) as
σ2
var ( ^β2 ) = VIF
∑ x22 i
σ2
var ( ^β3 ) = VIF
∑ x23 i
Which show that the variances of ^β 2 and ^β 3 are directly proportional to the VIF.
The results just discussed can be easily extended to the K-variable model. In such a
model, the variance of the Kth coefficient can be expressed as:
σ2
var ( ^β j )= ……………………………… 6.1.3.4
∑ x 2j (1−R 2j )
Where ^β j = (estimated) partial regression coefficient of regressor Xj
2
Rj = R2 in the regression of Xj on the remaining (K−2) regressions
[Note: There are (K−1) regressors in the K-variable regression model.]
66
X
(¿¿ j− X́ j )2
∑ x 2j =∑ ¿
We can also write (6.3.4) as
σ2
var ( ^β j )= VIF j ……………….………… (6.1.3.5)
∑ x 2j
As you can see from this expression, var ( ^β j ) is proportional to σ2 and VIF but
inversely proportional to ∑ x 2j . The last one states that the larger the variability in a
repressor, the smaller the variance of the coefficient of that repressor, assuming the other
two ingredients are constant, and therefore the greater the precision with which that
coefficient can be estimated.
“Insignificant” t Ratios
Recall that to test the null hypothesis that, say, β2 = 0, we use the t ratio, that is,
se (¿ β^ 2 )
^β 2 /¿ , and compare the estimated t value with the critical t value from the t table.
But as we have seen, in cases of high collinearity the estimated standard errors increase
dramatically, thereby making the t values smaller. Therefore, in such cases, one will
increasingly accept the null hypothesis that the relevant true population value is zero.
67
degree and not of kind. Some rules of thumb for detecting it or measuring its strength are
as follows.
1. High R2 but few significant t ratios. If R2 is high, say, in excess of 0.8, the F test
in most cases will reject the hypothesis that the partial slope coefficients are
simultaneously equal to zero, but the individual t tests will show that none or very
few of the partial slope coefficients are statistically different from zero.
2. High pair-wise correlations among regressors. Another suggested rule of
thumb is that if the pair-wise or zero-order correlation coefficient between two
regressors is high, say, in excess of 0.8, then multicollinearity is a serious
problem. High zero-order correlations are a sufficient but not a necessary
condition for the existence of multicollinearity because it can exist even though
the zero-order or simple correlations are comparatively low (say, less than 0.50).
3. High variance inflation factor. The larger the value of VIF j , the more
“troublesome” or collinear the variable Xj. As a rule of thumb, if the VIF of a
2
variable exceeds 10, which will happen if Rj exceeds 0.90, that variable is said
be highly collinear.
68
in the same direction. One way of minimizing this dependence is to proceed as
follows.
If the relation
Y t =β1 + β 2 X 2 t + β 3 X 3 t +ut ………………………………… 6.1.5 1
holds at time t, it must also hold at time t − 1 because the origin of time is arbitrary
anyway. Therefore, we have
Y t−1= β1 + β 2 X 2 ,t −1 + β 3 X 3 ,t −1+u t−1 ……………...……… 6.1.5.2
The first difference regression model often reduces the severity of multicollinearity
because, although the levels of X2 and X3 may be highly correlated, there is no a priori
reason to believe that their differences will also be highly correlated.
69
4. Additional or new data. Since multicollinearity is a sample feature, it is possible
that in another sample involving the same variables collinearity may not be as
serious as in the first sample. Sometimes simply increasing the size of the sample
may attenuate the collinearity problem. For example, in the three-variable model
we saw that
σ2
var ( ^β2 ) =
∑ x22 i (1−r 223)
Now as the sample size increases, ∑ x 22 i will generally increase. (Why?)
6.2. HETEROSCEDASTICITY
Saving Y
β 1+ β 2 X i
Income
X
70
Fig 6.1 Homoscedastic disturbances
In contrast, consider Figure 6.2 below, which shows that the conditional variance of
Y i increases as X increases. Here, the variances of Y i are not the same. Hence,
there is heteroscedasticity. Symbolically,
Saving Y
β 1+ β 2 X i
Income
X
Fig 6.2 Heteroscedastic disturbances
2. As incomes grow, people have more discretionary income and hence more scope
for choice about the disposition of their income. Hence, σ 2i is likely to increase
with income. Thus in the regression of savings on income one is likely to find
71
2
σ i increasing with income (as in Figure 6.2.) because people have more
choices about their savings behavior.
72
6.2.3. OLS ESTIMATION IN THE PRESENCE OF
HETEROSCEDASTICITY
In the presence of heteroscedasticity, ^β 2 is still linear unbiased and consistent
estimator? But, ^β 2 is no longer best (i.e. have no minimum variance). Then what is
BLUE in the presence of heteroscedasticity?
The answer is given in the following discussion.
Now assume that the heteroscedastic variances σ 2i are known. Divide (6.2.3.2)
through by σ i to obtain
Yi X X u
σi ( ) ( )( )
=β 1 1 i + β 2 2 i + i ……………...……… 6.2.3.3
σi σi σi
Where the starred, or transformed, variables are the original variables divided by (the
known) σ i .We use the notation β 1 and β 2 , the parameters of the transformed
¿ ¿
What is the purpose of transforming the original model? To see this, notice the following
¿
feature of the transformed error term ui :
ui 2
¿
()
¿ 2
var ( u i ) =E(ui ) =E
σi
1
¿ 2 E(u 2i ) Since σ 2i is known
σi
1
¿ 2 ( σ 2i ) Since E ( u2i ) =σ 2i
σi
¿1
73
¿
Which is a constant. That is, the variance of the transformed disturbance term ui is
now homoscedastic. Since we are still retaining the other assumptions of the classical
model, the finding that it is u¿ that is homoscedastic suggests that if we apply OLS to
the transformed model (6.2.3.3) it will produce estimators that are BLUE. In short, the
estimated β 1 and β 2 are now BLUE and not the OLS estimators ^β 1 and ^β 2 .
¿ ¿
This procedure of transforming the original variables in such a way that the transformed
variables satisfy the standard least-squares assumptions and then applying OLS to them is
known as the method of generalized least squares (GLS). The estimators thus obtained
are known as GLS estimators, and it is these estimators that are BLUE.
¿ ¿
The actual mechanics of estimating β 1 and β 2 are as follows. First, we write down
the SRF of (6.2.3.3)
Y i ^¿ X1i ^ ¿ X2i u^
σi
=β1( ) ( )( )
σi
+ β2
σi
+ i
σi
or
Y ¿i = ^β1¿ X ¿1 i + ^β ¿2 X ¿2i + u^ ¿i ………………………………..... 6.2.3.5
∑ u^ 2i =∑ (Y ¿i − ^β ¿1 X ¿1 i− β^ ¿2 X ¿2 i)2
that is,
2
u^ i 2 Y i ^¿ X1i ^¿ X2i
() i i
[( ) ( ) ( ) ]
∑ σ = ∑ σ − β1 σ − β2 σ
i i
.……………... 6.2.3.6
The actual mechanics of minimizing (6.2.3.6) follow the partial derivative techniques.
¿ ¿
Using this techniques, the GLS estimator of β 1 and β 2 is given as follows
74
wi
∑¿
¿
wi X2i Y i
wi X 2 i
wiY i
∑ ¿ …………………...... 6.2.3.7
¿
¿
¿∑ ¿¿
∑ ¿−¿
¿
^β¿2=¿
^β ¿1=Ý ¿ − ^β ¿2 X́ ¿2 where Ý ¿ =
∑ wi Y i and X́ ¿2=
∑ wi X 2 i ... 6.2.3.8
∑ wi ∑ wi
Thus, in GLS we minimize a weighted sum of residual squares with w i=1/σ 2i acting
as the weights, but in OLS we minimize an unweighted or (what amounts to the same
thing) equally weighted RSS. As (6.2.3.6) shows, in GLS the weight assigned to each
observation is inversely proportional to its σi , that is, observations coming from a
population with larger σi will get relatively smaller weight and those from a population
with smaller σi will get proportionately larger weight in minimizing the RSS (6.2.3.6).
75
6.2.5 DETECTION OF HETEROSCEDASTICITY
More often than not, in economic studies there is only one sample Y value corresponding
to a particular value of X. And there is no way one can know σ 2i from just one Y
observation. Therefore, in most cases involving econometric investigations,
heteroscedasticity may be identified based on the examination of
the OLS residuals u^ i since they are the ones we observe, and not the disturbances
ui . One hopes that they are good estimates of ui , a hope that may be fulfilled if the
sample size is fairly large.
Informal Methods
1) Nature of the Problem Very often the nature of the problem under consideration
suggests whether heteroscedasticity is likely to be encountered. For example,
The residual variance around the regression of consumption on income
increased with income.
As a matter of fact, in cross-sectional data involving heterogeneous units,
heteroscedasticity may be the rule rather than the exception.
u^ 2i u^ 2i u^ 2i
Y^ Y^ Y^
(a) (b) (c)
2
2
u^ i u^ i
76
Y^ Y^
(d) (e)
Fig 6.3 Hypothetical patterns of estimated squared residuals
In Figure 6.3, u^ 2i are plotted against Y^ i , the estimated Y i from the regression
line, the idea being to find out whether the estimated mean value of Y is systematically
related to the squared residual. In Figure 6.3a it can be seen that there is no systematic
pattern between the two variables, suggesting that perhaps no heteroscedasticity is
present in the data. Figure 6.3 b to e, however, exhibits definite patterns. For instance,
Figure 6.3c suggests a linear relationship, whereas Figure 6.3d and e indicates a quadratic
relationship between u^ 2i and Y^ i . Using such knowledge, one may transform the
data in such a manner that the transformed data do not exhibit heteroscedasticity. Instead
of plotting u^ 2i against Y^ i , one may plot them against one of the explanatory
variables, especially if plotting u^ 2i against Y^ i results in the pattern shown in Figure
6.3a. This is useful for cross check.
77
If it is believed that the variance of ui is proportional to the square of the explanatory
variable X, one may transform the original model as follows. Divide the original model
through by X i :
Y i β1 u
= +β 2+ i
Xi Xi Xi
1
¿ β 1 + β 2 + vi ……. 6.2.6.2
Xi
ui
where v i is the transformed disturbance term, equal to . Now it is easy to verify
Xi
that
ui 2 1
2
E ( v ) =E
i ( )
Xi
= 2 E ( u2i )
Xi
2
¿σ Using (6.2.6.1)
Hence the variance of v i is now homoscedastic, and one may proceed to apply OLS to
Yi 1
the transformed equation (6.2.6.2), regressing on . Notice that in the
Xi Xi
transformed regression the intercept term β 2 is the
slope coefficient in the original equation and the slope coefficient β 1 is the intercept
term in the original model. Therefore, to get back to the original model we shall have to
multiply the estimated (6.2.6.2) by X i .
Assumption 2: The error variance is proportional to X i . The square root
transformation:
E ( u2i ) =σ 2 X i …………………………............… 6.2.6.3
If it is believed that the variance of ui , instead of being proportional to the squared
X i , is proportional to X i itself, then the original model can be transformed as
follows:
Yi β1 ui
= + β2 √ X i +
√X i √ Xi √Xi
1
¿ β1 + β 2 √ X i+ v i …...... 6.2.6.4
√ Xi
ui
Where v i= and where X i > 0
√ Xi
Given assumption 2, one can readily verify that E ( v 2i ) =σ 2 , a homoscedastic situation.
Yi 1
Therefore, one may proceed to apply OLS to (6.6.4), regressing on and
√X i √X i
√ X i . Note an important feature of the transformed model: It has no intercept term.
Therefore, one will have to use the regression-through-the-origin model to estimate β 1
78
and β 2 . Having run (6.6.4), one can get back to the original model simply by
multiplying (6.6.4) by √ X i .
Assumption 3: The error variance is proportional to the square of the mean value of
Y.
2
E ( u2i ) =σ 2 [ E(Y i ) ] ………………..….……..… 6.2.6.5
Equation (6.6.5) postulates that the variance of ui is proportional to the square of the
expected value of Y. Now E(Y i )=β 1 + β 2 X i
6.3. AUTOCORRELATION
Another assumption of the regression model was the non-existence of serial correlation
(autocorrelation) between the disturbance terms, Ui.
79
w i≠k i w i=k i +c i
Serial correlation implies that the error term from one time period depends in some
systematic way on error terms from other time periods. Autocorrelation is more a
problem of time series data than cross-sectional data. If by chance, such a correlation is
observed in cross-sectional units, it is called spatial autocorrelation. So, it is important to
understand serial correlation and its consequences of the OLS estimators.
Nature of Autocorrelation
The classical model assumes that the disturbance term relating to any observation is not
influenced by the disturbance term relating to any other disturbance term.
But if there is any interdependence between the disturbance terms then we have
autocorrelation
Y i=α+βX i +U i ,ij
Causes of Autocorrelation
Serial correlation may occur because of a number of reasons.
Inertia (built in momentum) – a salient feature of most economic variables time
series (such as GDP, GNP, price indices, production, employment etc) is inertia or
sluggishness. Such variables exhibit (business) cycles.
Specification bias – exclusion of important variables or incorrect functional forms
Lags – in a time series regression, value of a variable for a certain period depends
on the variable’s previous period value.
Manipulation of data – if the raw data is manipulated (extrapolated or
interpolated), autocorrelation might result.
Autocorrelation can be negative as well as positive. The most common kind of serial
correlation is the first order serial correlation. This is the case in which this period error
terms are functions of the previous time period error term.
=αΣwi +βΣw i X i +Σw i ui
This is also called the first order autoregressive model.
-1 < P < 1
The disturbance term Ut satisfies all the basic assumptions of the classical linear model.
∴Ε(β∗)=αΣwi +βΣwi X i
Ε ( ui )=0 β∗¿
¿
β∗¿
¿
80
When the disturbance term exhibits serial correlation, the values as well as the standard
errors of the parameters are affected.
1) The estimates of the parameters remain unbiased even in the presence of
autocorrelation but the X’s and the u’s must be uncorrelated.
2) Serial correlation increases the variance of the OLS estimators. The minimum
variance property of the OLS parameter estimates is violated. That means the
OLS are no longer efficient.
β
Figure 1: The distribution of with and without serial correlation.
Detecting Autocorrelation
Some rough idea about the existence of autocorrelation may be gained by plotting the
residuals either against their own lagged values or against time.
81
Σwi =0 Σwi X=1
Figure 2: Graphical detection of autocorrelation
There are more accurate tests for the incidence of autocorrelation. The most common test
of autocorrelation is the Durbin-Watson Test.
The Durbin-Watson d Test
The test for serial correlation that is most widely used is the Durbin-Watson d test. This
test is appropriate only for the first order autoregressive scheme.
w i=k i +c i then Σwi =Σ(k i +c i )=Σk i +Σci
The test may be outlined as
Σc i= 0
Σki =Σwi =0
This test is, however, applicable where the underlying assumptions are met:
The regression model includes an intercept term
The serial correlation is first order in nature
The regression does not include the lagged dependent variable as an explanatory
variable
There are no missing observations in the data
82
Note that the numerator has one fewer observation than the denominator, because an
Σw X=1 and Σk X=1 ⇒Σc X=0
observation must be used to calculate i i i i i i . A great advantage of the d-statistic is that it
is based on the estimated residuals. Thus it is often reported together with R², t, etc.
The d-statistic equals zero if there is extreme positive serial correlation, two if there is no
serial correlation, and four if there is extreme negative correlation.
1. Extreme positive serial correlation: d 0
thus
var( β∗) and var ( β^ )
3. No serial correlation: d 2
But Durbin and Watson have successfully derived the upper and lower bound so that if
the computed value d lies outside these critical values, a decision can be made regarding
the presence of a positive or negative serial autocorrelation.
Thus
2 2 2
Σw 2=Σ(ki+ci ) =Σk i +2Σki ci+Σci
i
83
Σc i x i
Σki c i = =0
2
⇒ Σw i =Σk i +Σc i
2
since
2
Σx2i
But, since –1 P 1 the above identity can be written as: 0 d 4
Therefore, the bounds of d must lie within these limits.
2 2 2 2222
var(β∗)=σ (Σki +Σci )⇒σ Σki +σ Σci
Thus if d = 2, no serial autocorrelation.
^ 2 Σc 2
var(β∗)=var( β)+σ
if i d = 0, evidence of positive autocorrelation.
2
if σ Σc2i d = 4, evidence of negative autocorrelation.
Decision Rules for Durbin-Watson - d-test
Null hypothesis Decision If
Note: Other tests for autocorrelation include the Runs test and the Breusch-Godfrey (BG)
test. There are so many tests of autocorrelation since there is no particular test that has
been judged to be unequivocally best or more powerful in the statistical sense.
84
Chapter-7 Introduction to time series analysis
A time series is a sequential set of data points, measured typically over successive times.
A time series containing records of a single variable is termed as univariate. But if
records of more than one variable are considered, it is termed as multivariate. A time
series can be continuous or discrete. In a continuous time series observations are
measured at every instance of time, whereas a discrete time series contains observations
measured at discrete points of time. For example temperature readings, flow of a river,
concentration of a chemical process etc. can be recorded as a continuous time series. On
the other hand population of a particular city, production of a company, exchange rates
between two different currencies may represent discrete time series. Usually in a discrete
time series the consecutive observations are recorded at equally spaced time intervals
such as hourly, daily, weekly, monthly or yearly time separations.
The general tendency of a time series to increase, decrease or stagnate over a long period
of time is termed as Secular Trend or simply Trend. Thus, it can be said that trend is a
long term movement in a time series. For example, series relating to population growth,
number of houses in a city etc. show upward trend, whereas downward trend can be
observed in series relating to mortality rates, epidemics, etc.
Seasonal variations in a time series are fluctuations within a year during the season. The
important factors causing seasonal variations are: climate and weather conditions,
customs, traditional habits, etc. For example sales of ice-cream increase in summer, sales
85
of woolen cloths increase in winter. Seasonal variation is an important factor for
businessmen, shopkeeper and producers for making proper future plans.
The cyclical variation in a time series describes the medium-term changes in the series,
caused by circumstances, which repeat in cycles. The duration of a cycle extends over
longer period of time, usually two or more years. Most of the economic and financial
time series show some kind of cyclical variation. For example a business cycle consists
of four phases, viz. Prosperity, ii) Decline, iii) Depression and iv) Recovery
For monthly data, an additive model assumes that the difference between the January and
July values is approximately the same each year. In other words, the amplitude of the
seasonal effect is the same each year. The model similarly assumes that the residuals are
roughly the same size throughout the series -- they are a random component that adds on
to the other components in the same way at all parts of the series.
86
Multiplicative model is based on the assumption that the four components of a time
series are not necessarily independent and they can affect one another; whereas in the
additive model it is assumed that the four components are independent of each other.
In many time series involving quantities (e.g. money, wheat production,), the absolute
differences in the values are of less interest and importance than the percentage changes.
For example, in seasonal data, it might be more useful to model that the July value is the
same proportion higher than the January value in each year, rather than assuming that
their difference is constant. Assuming that the seasonal and other effects act
proportionally on the series is equivalent to a multiplicative model.
Fortunately, multiplicative models are equally easy to fit to data as additive models! The
trick to fitting a multiplicative model is to take logarithms of both sides of the model,
after taking logarithms (either natural logarithms or to base 10), the four components of
the time series again act additively.
Time series analysis is an analytical technique that is broadly applicable to a wide variety
of domains. In domains in which data are collected at specific (usually equally spaced)
intervals a time series analysis can reveal useful patterns and trends related to time.
Industrial as well as governmental agencies rely on time series analysis for both historical
understanding as well forecasting and predictive modeling. In this week's column, the
basics of the statistical approach to an analysis of time series are presented
87
A time series is mathematically defined as a set of observed values taken at specified
times. The set of values is typically denoted "Y", and the set of times as "t1, t2, t3, ...etc".
In other words, Y is a function of "t", and the goal of a time series analysis is to find a
function that describes the movement of data. It should be noted that a time series is often
graphed, and trends and patterns are visually apparent. The statistical approach essentially
describes visually apparent trends formally and quantitatively.
A time series analysis is often referred to as "time series decomposition". As the phrase
suggests, this means that the time series is decomposed into its component parts. There
are typically four main components and together these components sufficiently describe
the variations of data over time. These four components are (1) the long-term trend, "T";
(2) seasonal variations, "S"; (3) cyclical patterns, "C"; and (4) irregularities or noise, "I".
As an equation, the time series is generally described either as a multiplicative
relationship where,
If the pattern in the data is not very obvious, and you have trouble choosing between the
additive and multiplicative procedures, you can try both and choose the one with smaller
accuracy measures.
The additive model is useful when the seasonal variation is relatively constant
over time.
The multiplicative model is useful when the seasonal variation increases over
time.
88
89