Econometrics Chapter 1 7 2d AgEc 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 89

CHAPTER 1: INTRODUCTION

1.1 WHAT IS ECONOMETRICS?


Literally interpreted, econometrics means “economic measurement.” So econometrics deals with
the measurement of economic relationships. Econometrics may be defined as the social science in
which the tools of economic theory, mathematics, and statistical inference are applied to the
analysis of economic phenomena. Econometrics may also be considered as the integration of
economics, mathematics and statistics for the purpose of providing numerical values for the
parameters of economic relationships (e.g., elasticity, propensities, marginal values) and verifying
economic theories.

The most important characteristics of economic relationships is that they contain a random
element, which, however ignored by economic theory and mathematical economics which
postulate exact relationships between the varies economic magnitudes. Econometrics has
developed methods for dealing with random component of economic relationships.

1.2 WHY A SEPARATE DISCIPLINE?


As the preceding definitions suggest, econometrics is an amalgam of economic theory,
mathematical economics, economic statistics, and mathematical statistics but it is completely
distinct from each one of these branches of science for the following reasons.

 Economic theory makes statements that are mostly qualitative in nature, while
econometrics gives empirical content to most economic theory
 The main concern of Mathematical economics is to express economic theory in
mathematical form without empirical verification of the theory, while econometrics is
mainly interested in the later
 Economic Statistics is mainly concerned with collecting, processing and presenting
economic data in the form of charts and tables. It does not being concerned with using the
collected data to test economic theories. While econometrics do.
 Mathematical statistics provides many of tools for economic studies, but econometrics
supplies the later with many special methods of quantitative analysis based on economic
data

Goals of econometrics
There are three main goals of econometrics:
(1) Analysis, i.e. testing of econometric theory
(2) Policy making, i.e. supplying numerical estimates of the coefficients of economic
relationships, which may be then used for decision making
(3) Forecasting, i.e. using the numerical estimates of the coefficients in order to forecast the
future values of the economic magnitudes. Of course, these goals are not mutually
exclusive. Successful econometric applications should include some combinations of all
three aims.

1.3 METHODOLOGY OF ECONOMETRICS

1
In any econometric research one may distinguish four stages.

Stage A: Specification of the model


The first, and most important, step the econometrician has to take in attempting the study of any
relationship between variables is to express these relationship in mathematical form, that is to
specify the model, with which the economic phenomenon will be explored empirically. This is
called the specification of the model or formulation of the maintained hypothesis. It involves the
determination of:
(1) The dependent and the explanatory variables which will be included in the model
(2) A priori theoretical expectation about the sign and size of the parameters of the
function
(3) The mathematical form of the model (number of equations, linear or non-linear form
of these equations, etc)

The specification of the econometric model will be based on econometric theory and on any
available information relating to the phenomenon being studied. Thus the specification of the
model presupposes knowledge of econometric theory as well as familiarity with the particular
phenomenon being studied.

The number of variables to be included in the model depends on the nature of the phenomenon
being studied and the purpose of the research. Usually we introduce explicitly in the function
only the most important (four or five) explanatory variables. The influence of less important
factors is taken into account by the introduction in the model of a random variable, usually
denoted by u. The values of this random variable cannot be actually observed like the values of
the other explanatory variables. We thus have to guess at the pattern of the values of u by making
some plausible assumptions about their distribution. The statement of the assumptions about the
random variable is part of the specification of the model.

Thus the number of variables to be initially included in the model depends on the nature of the
economic phenomenon being studied, while the number of variables which will finally be
retained in the model depends on whether the parameter estimates related to the variables pass the
economic, statistical and econometric criteria, which we will discuss later.

In most cases economic theory does not explicitly state the mathematical form of economic
relationships. It is often helpful to plot the actual data on two-dimensional diagrams, taking two
variables at a time (the dependent and each one of the explanatory variables in turn). In most
cases the examinations of such scatter diagrams throws some light on the form of the function
and helps in deciding upon the choices of the mathematical form of the relationship connecting
the economic variables.

Stage B: Estimation of the model

2
After the model has been specified (formulated) the econometrician proceed with its estimation,
in other words, he must obtain numerical estimates of the coefficients of the model. The
estimation of the model is a purely technical stage which requires knowledge of the various
econometric methods, their assumptions and the economic implications of the parameters.

The stage of estimation includes the following steps


(1) Gathering of statistical observations (data) on the variables included in the model

(2) Examination of the identification conditions of the function in which we are interested
Identification is the procedure by which we attempt to establish that the coefficients which we
shall estimate by the application of some appropriate econometric technique are actually the true
coefficients of the functions in which we are interested.

(3) Examination of the aggregation problems involved in the variables of the function
Aggregation problems arise from the fact that we use aggregative variables in our functions. Such
aggregative variables may involve: aggregation over individuals, aggregation over commodities,
aggregation over time periods, and spatial aggregation. The above sources of aggregation create
various complications which may impart some ‘aggregation bias’ in the estimates of the
coefficients.

(4) Examination of the degree of correlation between the explanatory variables, that is,
examination of the degree of multicollinearity
Most economic variables are correlated, in the sense that they tend to change simultaneously
during the various phases of economic activity. If, however, the degree of collinearity is high, the
results (measurements) obtained from econometric applications may be seriously impaired and
their use may be greatly misleading, because in these conditions it may not be computationally
possible to separate the influence of each one explanatory variable.

(5) Choice of the appropriate econometric technique


The coefficients of economic relationships may be estimated by various methods which may be
classified in two main groups:
(i) Single-equation techniques. These are techniques which are applied to one equation at a
time.
(ii)Simultaneous- equation techniques. These are techniques which are applied to all the
equations of a system at once, and give estimates of the coefficients of all the functions
simultaneously.

Stage C: Evaluation of estimates


After the estimation of the model, the evaluation of the results of the calculations will come (that
is the determination of the reliability of these results). For this purpose, we use various criteria
which may be classified into three groups. Firstly, economic a priori criteria, which are
determined by economic theory. Secondly, statistical criteria, determined by statistical theory.
Thirdly, econometric criteria, determined by econometric theory.

Economic a priori criteria


These are determined by the principles of economic theory and refer to the sign and the size of
the parameters of economic relationships. The coefficients of economic models are the ‘constants’
of economic theory: elasticities, marginal values, multipliers, propensities, etc. Economic theory
defines the signs of these coefficients and their magnitude.

3
If the estimates of the parameters turn up with signs or size not conforming to economic theory,
they should be rejected, unless there is good reason to believe that in the particular instance the
principle of economic theory does not hold.

Statistical criteria: first-order tests


These are determined by statistical theory and aim at the evaluation of the statistical reliability of
the estimates of the parameters of the model. The most widely used Statistical criteria are the
correlation coefficient and the standard deviation (or standard error) of the estimates.

It should be noted that the statistical criteria are secondary only to the a priori theoretical criteria.
The estimates of the parameters should be rejected in general if they happen to have the wrong
sign (or size) even though the correlation coefficient is high, or the standard errors suggest that
the estimates are statistically significant. In such cases the parameters, though statistically
satisfactory, are theoretically implausible, that is to say they make no sense on the basis of the a
priori theoretical-economic criteria.

Econometric criteria: second-order tests


These are set by the theory of econometrics and aim at the investigation of whether the
assumptions of the econometric method employed are satisfied or not in any particular case. The
econometric criteria serve as a second-order tests (as tests of the statistical tests); in other words
they determine the reliability of the statistical criteria.

The evaluation of the results obtained from the estimation of the model, is a very complex
procedure. The econometrician must use all the above criteria, economic, statistical and
econometric, before he can accept or reject the estimates.

Stage D: Evaluation of the forecasting power of the estimated model


Forecasting is one of the prime aims of econometric research. Before using an estimated model
for forecasting the value of the dependent variable, we must asses by some way or another the
predictive power of the model. It is possible that the model is economically meaningful and
statistically and econometrically correct for the sample period for which the model has been
estimated, yet it may very well not be suitable for forecasting due, for example, to rapid change in
the structural parameters of the relationship in the real world.

Desirable properties of an econometric model


An econometric model is a model whose parameters have been estimated with some appropriate
econometric technique. The goodness of an econometric model is judged customarily according
to the following desirable properties.

(1) Theoretical plausibility. The model should be compatible with the postulates of economic
theory. It must describe adequately the economic phenomena to which it relates.
(2) Explanatory ability. The model should be able to explain the observations of the actual
world. It must be consistent with the observed behavior of the economic variables whose
relationships it determines.
(3) Accuracy of the estimates of the parameters. The estimates of the coefficients should
be accurate in the sense that they should approximate as best as possible the true
parameters of the structural model.
(4) Forecasting ability. The model should produce satisfactory predictions of future
values of the dependent variable.

4
(5) Simplicity. The model should represent the economic relationships with maximum
simplicity. The fewer the equations and the simpler their mathematical form, the
better the model is considered, provided that the other desirable properties are not
affected by the simplification of the model.

The more of the above properties a model possesses, the better it is considered for any
practical purpose.

To illustrate the preceding steps, let us consider the well-known Keynesian theory of
consumption.

(1) Statement of theory or hypothesis:


Keynes stated: “Consumption increases as income increases, but not as much as the increase in
income”. It means that “The marginal propensity to consume (MPC) for a unit change in income
is greater than zero but less than unit”

(2) Specification of the mathematical model of the theory


Although Keynes postulated a positive relationship between consumption and income, he did not
specify the precise form of the functional relationship between the two. For simplicity, a
mathematical economist might suggest the following form of the Keynesian consumption
function:

Y = β1 + β2X; 0 < β2 < 1 ------------------------------------------------- (1.3.1)

Where Y = consumption expenditure and X = income, and where β1 and β2, known as the
parameters of the model, are, respectively, the intercept and slope coefficients.

The slope coefficient β2 measures the MPC. This equation, which states that consumption is
linearly related to income, is an example of a mathematical model of the relationship between
consumption and income that is called the consumption function in economics. A model is
simply a set of mathematical equations. If the model has only one equation, as in the preceding
example, it is called a single-equation model, whereas if it has more than one equation, it is
known as a multiple-equation model.

In Eq. (1.3.1) the variable appearing on the left side of the equality sign is called the dependent
variable and the variable(s) on the right side are called the independent, or explanatory,
variable(s). Thus, in the Keynesian consumption function, Eq. (1.3.1), consumption (expenditure)
is the dependent variable and income is the explanatory variable.

(3) Specification of the econometric model of the theory


The purely mathematical model of the consumption function given in Eq. (1.3.1) is of limited
interest to the econometrician, for it assumes that there is an exact or deterministic relationship
between consumption and income. But relationships between economic variables are generally
inexact. For example, in the above example, in addition to income, other variables affect
consumption expenditure. Such as, size of family, ages of the members in the family, family
religion, etc., are likely to exert some influence on consumption.

5
To allow for the inexact relationships between economic variables, the econometrician would
modify the deterministic consumption function (1.3.1) as follows:

Y = β1 + β2X + u -------------------------------------------------------------------- (1.3.2)

Where u, known as the disturbance, or error term, is a random (stochastic) variable. The
disturbance term u may well represent all those factors that affect consumption but are not taken
into account explicitly.

Equation (1.3.2) is an example of an econometric model. More technically, it is an example of a


linear regression model. The econometric consumption function hypothesizes that the dependent
variable Y (consumption) is linearly related to the explanatory variable X (income) but that the
relationship between the two is not exact; it is subject to individual variation.

(4) Obtaining Data


To estimate the econometric model given in (1.3.2), that is, to obtain the numerical values of β1
and β2, we need data. Let us say we obtained the data given in table 1.1 which relates to a given
economy for the period 1980-1991.
Table 1.1
Year X Y

1980 2447.1 3776.3


1981 2476.9 3843.1
1982 2503.7 3760.3
1983 2619.4 3906.6
1984 2746.1 4148.5
1985 2865.8 4279.8
1986 2969.1 4404.5
1987 3052.2 4539.9
1988 3162.4 4718.6
1989 3223.3 4838.0
1990 3260.4 4877.5
1991 3240.8 4821.0

Y = Personal consumption expenditure and


both in Billion US Dollars
X = Gross Domestic Product

(5) Estimating the Econometric Model


Note that the statistical technique of regression analysis is the main tool used to obtain the
estimates. Using this technique and the data given in Table 1.1, we obtain the following estimates
of β1 and β2, namely, - 231.8 and 0.7194. Do not worry about the method, in which we obtained
the estimates, we will see it in chapter three. Thus, the estimated consumption function is:

Y^ i = -231.8 + 0.7194Xi -------------------------------------------------------------------- (1.3.3)

The hat on the Y indicates that it is an estimate. MPC was about 0.72 and it means that for the
sample period when real income increases by 1 USD, led (on average) real consumption
expenditure increases of about 72 cents.
Note: A hat symbol (^) above one variable will signify an estimator of the relevant population
value

6
(6) Hypothesis Testing
Are the estimates accord with the expectations of the theory that is being tested? Is MPC < 1
statistically? If so, it may support Keynes’ theory. Confirmation or refutation of economic
theories based on sample evidence is object of Statistical Inference (hypothesis testing).

(7) Forecasting or Prediction


With given future value(s) of X, what is the future value(s) of Y?
If GDP = $6000Bill in 1994, what is the forecasted consumption expenditure?
Y^ = - 231.8 + 0.7196(6000) = 4084.6Bill

(8) Using model for control or policy purposes


Suppose we have the estimated consumption function given in (1.3.3). Suppose further the
government believes that consumer expenditure of about 4000 in the coming year (1992) will
keep the unemployment rate at its current level of about 4.2 percent. What level of income will
guarantee the target amount of consumption expenditure?

Y = 4000 = -231.8 + 0.7194 X Þ X ~ 5882

Given MPC = 0.72, an income of $5882 Bill will produce an expenditure of $4000 Bill. By fiscal
and monetary policy, Government can manipulate the control variable X to get the desired level
of target variable Y.

1.4 TYPES OF ECONOMETRICS


Econometrics may be divided into two broad categories: theoretical econometrics and applied
econometrics.

Theoretical econometrics is concerned with the development of appropriate methods for


measuring economic relationships specified by econometric models. In this aspect, econometrics
leans heavily on mathematical statistics. For example, one of the methods used extensively in
econometrics is least squares. Theoretical econometrics must spell out the assumptions of this
method, its properties, and what happens to these properties when one or more of the assumptions
of the method are not fulfilled.

In applied econometrics we use the tools of theoretical econometrics to study some special
field(s) of economics and business, such as the production function, investment function, demand
and supply functions, etc.

7
Chapter 2: THE NATURE OF REGRESSION ANALYSIS AND
TWO-VARIABLE REGRESSION ANALYSIS

2.1 THE NATURE OF REGRESSION ANALYSIS

2.1.1 THE MODERN INTERPRETATION OF REGRESSION


Broadly speaking, we may say Regression analysis is concerned with the study of the dependence
of one variable, the dependent variable, on one or more other variables, the explanatory
variables, with a view to estimating and/or predicting the (population) mean or average value of
the former in terms of the known or fixed (in repeated sampling) values of the latter.

Example
Dependent Variable Y; Explanatory Variable Xs
1. Y = Personal Consumption Expenditur X = Personal Disposable Income
2. Y = Demand; X = Price
3. Y = % Change in Demand; X = % Change in the advertising budget
4. Y = Crop yield; Xs = temperature, rainfall, sunshine, fertilizer

STATISTICAL VERSUS DETERMINISTIC RELATIONSHIPS


In statistical relationships among variables we essentially deal with random or stochastic
variables, that is, variables that have probability distributions. In functional or deterministic
dependency, on the other hand, we also deal with variables, but these variables are not random or
stochastic. In regression analysis, we are concerned with STATISTICAL DEPENDENCE among
variables (not Functional or Deterministic), we essentially deal with RANDOM or
STOCHASTIC variables (with the probability distributions).

REGRESSION VERSUS CAUSATION


Although regression analysis deals with the dependence of one variable on other variables, it does
not necessarily imply causation. In the words of Kendall and Stuart, “A statistical relationship,
however strong and however suggestive, can never establish causal connection: our ideas of
causation must come from outside statistics, ultimately from some theory or other.”

REGRESSION VERSUS CORRELATION


Closely related to but conceptually very much different from regression analysis is correlation
analysis, where the primary objective is to measure the strength or degree of linear association
between two variables. In regression analysis, however, we are not primarily interested in such a
measure. Instead, we try to estimate or predict the average value of one variable on the basis of
the fixed values of other variables.

8
2.1.2 TERMINOLOGY AND NOTATION
In the literature the terms dependent variable and explanatory variable are described variously. A
representative list is:
Dependent Variable Explanatory Variable(s)
¯
Explained Variable Independent Variable(s)
¯
Predictand Predictor(s)
¯
Regressand Regressor(s)
¯
Response Stimulus or control variable(s)
¯
Endogenous Exogenous(es)

If we are studying the dependence of a variable on only a single explanatory variable, such as that
of consumption expenditure on real income, such a study is known as simple, or two-variable,
regression analysis. However, if we are studying the dependence of one variable on more than
one explanatory variable, as in the crop-yield, rainfall, temperature, sunshine, and fertilizer
examples, it is known as multiple regression analysis.

2.1.3 TYPES OF DATA REQUIRED FOR ECONOMIC ANALYSIS


The success of any econometric analysis ultimately depends on the availability of the appropriate
data. Three types of data may be available for empirical analysis: time series, cross-section, and
pooled (i.e., combination of time series and crosssection) data.

Time Series Data


A time series is a set of observations on the values that a variable takes at different times. Such
data may be collected at regular time intervals, such as daily (e.g., stock prices, weather reports),
weekly (e.g., money supply figures), monthly [e.g., the unemployment rate, the Consumer Price
Index (CPI)], quarterly (e.g., GDP), annually (e.g., government budgets).

Cross-Section Data Cross-section data are data on one or more variables collected at the same
point in time, such as the census of population conducted by the Census Bureau every 10 years.

Pooled Data In pooled, or combined, data are elements of both time series and cross-section data.

2.1.4 THE MEASUREMENT SCALES OF VARIABLES


The variables that we will generally encounter fall into four broad categories: ratio scale, interval
scale, ordinal scale, and nominal scale. It is important that we understand each.

Ratio Scale For a variable X, taking two values, X1 and X2, the ratio X1/X2 and the distance (X2 −
X1) are meaningful quantities. Also, there is a natural ordering (ascending or descending) of the
values along the scale. Therefore, comparisons such as X2 ≤ X1 or X2 ≥ X1 are meaningful. Most
economic variables belong to this category. E.g., it is meaningful to ask how big this year’s GDP
is as compared with the previous year’s GDP.

9
Interval Scale An interval scale variable satisfies the last two properties of the ratio scale
variable but not the first. Thus, the distance between two time periods, say (2000–1995) is
meaningful, but not the ratio of two time periods (2000/1995).

Ordinal Scale A variable belongs to this category only if it satisfies the third property of the ratio
scale (i.e., natural ordering). Examples are grading systems (A, B, C grades) or income class
(upper, middle, lower). For these variables the ordering exists but the distances between the
categories cannot be quantified.

Nominal Scale Variables in this category have none of the features of the ratio scale variables.
Variables such as gender (male, female) and marital status (married, unmarried, divorced,
separated) simply denote categories.

2.2 TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS

2.2.1 A HYPOTHETICAL EXAMPLE


As noted in Section 2.1.1, regression analysis is largely concerned with estimating and/or
predicting the (population) mean value of the dependent variable on the basis of the known or
fixed values of the explanatory variable(s). To understand this, consider the data given in Table
2.1.

The data in the table refer to a total population of 60 families in a hypothetical community and
their weekly income (X) and weekly consumption expenditure (Y), both in dollars. The 60
families are divided into 10 income groups (from $80 to $260) and the weekly expenditures of
each family in the various groups are as shown in the table. Therefore, we have 10 fixed values of
X and the corresponding Y values against each of the X values.

There is considerable variation in weekly consumption expenditure in each income group, which
can be seen clearly from Figure 2.1. But the general picture that one gets is that, despite the
variability of weekly consumption expenditure within each income bracket, on the average,
weekly consumption expenditure increases as income increases.

10
Fig 2.1 Conditional distribution of expenditure for various level of income (data of table 2.1)

To see this clearly, in Table 2.1 we have given the mean, or average, weekly consumption
expenditure corresponding to each of the 10 levels of income. Thus, corresponding to the weekly
income level of $80, the mean consumption expenditure is $65, while corresponding to the
income level of $200, it is $137. In all we have 10 mean values for the 10 subpopulations of Y.
We call these mean values conditional expected values, as they depend on the given values of
the (conditioning) variable X. Symbolically, we denote them as E(Y|X), which is read as the
expected value of Y given the value of X.

It is important to distinguish these conditional expected values from the unconditional expected
value of weekly consumption expenditure, E(Y). If we add the weekly consumption expenditures
for all the 60 families in the population and divide this number by 60, we get the number $121.20
($7272/60), which is the unconditional mean, or expected, value of weekly consumption
expenditure, E(Y); it is unconditional in the sense that in arriving at this number we have
disregarded the income levels of the various families. Obviously, the various conditional expected
values of Y given in Table 2.1 are different from the unconditional expected value of Y of
$121.20. When we ask the question, “What is the expected value of weekly consumption
expenditure of a family,” we get the answer $121.20 (the unconditional mean). But if we ask the
question, “What is the expected value of weekly consumption expenditure of a family whose
monthly income is, say, $140,” we get the answer $101 (the conditional mean).

Geometrically, a population regression curve (line) is simply the locus of the conditional means
of the dependent variable for the fixed values of the explanatory variable(s).

11
2.2.2 THE CONCEPT OF POPULATION REGRESSION FUNCTION (PRF)
From the preceding discussion, it is clear that each conditional mean E(Y | Xi) is a function of Xi,
where Xi is a given value of X. Symbolically,

E(Y | Xi) = f (Xi) ------------------------------------------------------------------- (2.2.1)

Where f (Xi) denotes some function of the explanatory variable X. In the above example, E(Y | Xi)
is a linear function of Xi. Equation (2.2.1) is known as the conditional expectation function
(CEF) or population regression function (PRF) or population regression (PR) for short. It
states merely that the expected value of the distribution of Y given Xi is functionally related to Xi.
In simple terms, it tells how the mean or average response of Y varies with X.

As a first approximation or a working hypothesis, we may assume that the PRF E(Y | Xi) is a
linear function of Xi, say, of the type

E(Y | Xi) = β1 + β2 Xi -------------------------------------------------------------- (2.2.2)

Where β1 and β2 are unknown but fixed parameters known as the regression coefficients

It is clear from Figure 2.1 that, as family income increases, family consumption expenditure on
the average increases, too. But what about the consumption expenditure of an individual family in
relation to its (fixed) level of income? It is obvious from Table 2.1 and Figure 2.1 that an
individual family’s consumption expenditure does not necessarily increase as the income level
increases. For example, from Table 2.1 we observe that corresponding to the income level of
$100 there is one family whose consumption expenditure of $65 is less than the consumption
expenditures of two families whose weekly income is only $80. But notice that the average
consumption expenditure of families with a weekly income of $100 is greater than the average
consumption expenditure of families with a weekly income of $80 ($77 versus $65).

We see from Figure 2.1 that, given the income level of Xi, an individual family’s consumption
expenditure is clustered around the average consumption of all families at that Xi, that is, around
its conditional expectation. Therefore, we can express the deviation of an individual Yi around its
expected value as follows:
ui = Yi − E(Y | Xi)
or
Yi = E(Y | Xi) + ui ------------------------------------------------------- (2.2.3)

where the deviation ui is an unobservable random variable taking positive or negative values.
Technically, ui is known as the stochastic disturbance or stochastic error term.

How do we interpret (2.2.3)? We can say that the expenditure of an individual family, given its
income level, can be expressed as the sum of two components: (1) E(Y | Xi), which is simply the
mean consumption expenditure of all the families with the same level of income. This component
is known as the systematic, or deterministic, component, and (2) ui, which is the random, or
nonsystematic, component. Stochastic disturbance term (ui) is a surrogate or proxy for all the
omitted or neglected variables that may affect Y but are not (or cannot be) included in the
regression model.
If E(Y | Xi) is assumed to be linear in Xi, as in Eq. (2.2.2), Eq. (2.2.3) may be written as
Yi = E(Y | Xi) + ui
= β1 + β2 Xi + ui ----------------------------------------------- (2.2.4)
we call this equation stochastic specification of the PRF (true PRF)

12
2.2.3 THE SAMPLE REGRESSION FUNCTION (SRF)
It is about time to face up to the sampling problems, for in most practical situations what we have
is but a sample of Y values corresponding to some fixed X’s. Therefore, the task now is to
estimate the PRF on the basis of the sample information.

As an illustration, pretend that the population of Table 2.1 was not known to us and the only
information we had was a randomly selected sample of Y values for the fixed X’s as given in
Table 2.2. Unlike Table 2.1, we now have only one Y value corresponding to the given X’s; each
Y (given Xi) in Table 2.2 is chosen randomly from similar Y’s corresponding to the same Xi from
the population of Table 2.1.

The question is: From the sample of Table 2.2 can we predict the average weekly consumption
expenditure Y in the population as a whole corresponding to the chosen X’s? In other words, can
we estimate the PRF from the sample data? As one surely suspects, we may not be able to
estimate the PRF “accurately” because of sampling fluctuations. To see this, suppose we draw
another random sample from the population of Table 2.1, as presented in Table 2.3.
Table 2-2: A random sample from the Table 2-3: Another random sample
population of table 2.1 from the population of table 2.1
Y X Y X
70 80 55 80
65 100 88 100
90 120 90 120
95 140 80 140
110 160 118 160
115 180 120 180
120 200 145 200
140 220 135 220
155 240 145 240
150 260 175 260

Plotting the data of Tables 2.3 and 2.4, we obtain the scattergram given in Figure 2.2. In the
scattergram two samples regression lines are drawn so as to “fit” the scatters reasonably well:
SRF1 is based on the first sample, and SRF 2 is based on the second sample. Which of the two
regression lines represents the “true” population regression line? There is no way we can be
absolutely sure that either of the regression lines shown in Figure 2.2 represents the true
population regression line (or curve).

Fig 2.2 Regression lines based on two different samples

13
The regression lines in Figure 2.2 are known as the sample regression lines. They represent the
population regression line, but because of sampling fluctuations they are at best an approximation
of the true PR. In general, we would get N different SRFs for N different samples, and these SRFs
are not likely to be the same.

Now, analogously to the PRF that underlies the population regression line, we can develop the
concept of the sample regression function (SRF) to represent the sample regression line. The
sample counterpart of Eq.(2.2.2) may be written as

Y^ i = ^β 1 + ^β
2Xi ------------------------------------------------------------------------------

(2.2.5)
where Y ^ i is read as “Y-hat’’ or “Y-cap’’
Y^ i = estimator of E(Y | Xi)
^β 1 = estimator of β1
^β 2 = estimator of β2

Note that an estimator, also known as a (sample) statistic, is simply a rule or formula or method
that tells how to estimate the population parameter from the information provided by the sample
at hand. A particular numerical value obtained by the estimator in an application is known as an
estimate.

Now just as we expressed the PRF in two equivalent forms, (2.2.2) and (2.2.4), we can express
the SRF (2.2.5) in its stochastic form as follows:

Yi = ^β 1 + ^β 2Xi + u^ i -------------------------------------------------------------------------
(2.2.6)

where, in addition to the symbols already defined, u^ i denotes the (sample) residual term.
Conceptually u^ i is analogous to ui and can be regarded as an estimate of ui.

To sum up, our primary objective in regression analysis is to estimate the PRF

Yi = β1 + β2Xi + ui ------------------------------------------------------------------------- (2.2.4)

on the basis of the SRF

Yi = ^β 1 + ^β 2 Xi + u^ i ------------------------------------------------------------------------- (2.2.6)

because more often than not, our analysis is based upon a single sample from some population.

F The deviations of the observations from the line may be attributed to several factors.
(1) Omission of variables from the function
In economic reality each variable is influenced by a very large number of factors.
However, not all the factors influencing a certain variable can be included in the function
for various reasons.
(2) Random behavior of the human beings
The scatter of points around the line may be attributed to an erratic element which is
inherent in human behavior. Human reactions are to a certain extent unpredictable and
may cause deviations from the normal behavioral pattern depicted by the line.

14
(3) Imperfect specification of the mathematical form of the model
We may have linearised a possibly nonlinear relationship. Or we may have left out of the
model some equations.
(4) Errors of aggregation
We often use aggregate data (aggregate consumption, aggregate income), in which we
add magnitudes referring to individuals whose behavior is dissimilar. In this case we say
that variables expressing individual peculiarities are missing.

(5) Errors of measurement


This refers to errors of measurement of the variables, which are inevitable due to the
methods of collecting and processing statistical information.

The first four sources of error render the form of the equation wrong, and they are usually
referred to as error in the equation or error of omission. The fifth source of error is called error of
measurement or error of observation.

In order to take in to account the above sources of error, we introduce in econometric functions a
random variable u called random disturbance term of the function, so called because u is
supposed to disturb the exact linear relationship which is assumed to exist between X and Y.

2.2.4 THE MEANING OF THE TERM LINEAR


Linearity can be interpreted in two different ways.

Linearity in the Variables


The first and perhaps more “natural” meaning of linearity is that the conditional expectation of Y
is a linear function of Xi, such as, for example, (2.2.2). Geometrically, the regression curve in this
case is a straight line. In this interpretation, a regression function such as E(Y | Xi) = β1 + β2 X 2i
is not a linear function because the variable X appears with a power or index of 2.

Linearity in the Parameters


The second interpretation of linearity is that the conditional expectation of Y, E(Y | Xi), is a linear
function of the parameters, the β’s; it may or may not be linear in the variable X. In this
2
interpretation E(Y | Xi) = β1 + β2 X i is a linear (in the parameter) regression model.

Of the two interpretations of linearity, linearity in the parameters is relevant for the development
of the regression theory to be presented shortly. Therefore, from now on the term “linear”
regression will always mean a regression that is linear in the parameters; the β’s (that is, the
parameters are raised to the first power only). It may or may not be linear in the explanatory
variables, the X’s.

15
Chapter 3: SIMPLE LINEAR REGRESSION MODELS

The Ordinary Least Squares Methods (OLS)

There are two major ways of estimating regression functions. These are ordinary least
square (OLS) method and maximum likelihood (MLH) method. Both methods are
basically similar to their application in estimations that you may be aware of in statistics
courses. The ordinary least square method is easiest and most commonly used method as
opposed to maximum likelihood (MLH) method which is limited by its assumptions.
MLH method is valid only for large sample as opposed to OLS method which can be
applied to smaller samples. The Ordinary least square (OLS) method of estimating
parameters or regression function is about finding or estimating values of parameters of
simple linear regression function given below for which errors or residuals are
minimized.

To estimate the coefficients β 1 and β 2 we need observations on X, Y and u. yet u is


never observed like the other explanatory variables, and therefore in order to estimate the
function Yi = β1 + β2Xi + ui, we should guess the values of u, that is we should make some
reasonable assumptions about the shape of the distribution of each ui (its means, variance
and covariance with other u’s). These assumptions are guesses about the true, but
unobservable, value of ui.

3.1 The Assumptions Underlying the Method of Least Squares


The linear regression model is based on certain assumptions, some of which refers to the
distribution of the random variable u, some to the relationship between u and the
explanatory variables, and some refers to the relationship between the explanatory
variables themselves.
1. ui is a random real variable and has zero mean value: E(ui) = 0 (or E(uiXi) = 0)
 This implies that for each value of X, u may assume various values,
some positive, and some negative but on average zero.

16
 Further E(Yi) = β1 + β2Xi gives the relationship between X and Y on
the average, i.e. when X takes on value X i , then Y will on the average
take on E(Yi) (or E(YiXi))
2. The variance of ui is constant for all i, i.e., var(u iXi) = E( u2i Xi) = ❑2 ,
and is called the assumptions of common variance or homoscedasticity.
 The implication is that for all values of X, the values of u show the
same dispersion around their mean.
 The consequence of this assumption is that var(yiXi) = ❑2
 If on the other hand the variance of Y population varies as X changes,
a situation of non-constancy of the variance of Y, called
heteroscedasticity arises.
3. ui has a normal distribution, i.e., ui ∼ N(0, ❑2 ), which also implies
Yi ∼ N(β1 + β2Xi, ❑2 ).
4. The random terms of different observations are independent, cov(u iuj) =E(uiuj)
= 0 for i ≠ j where i and j run from 1 to n. This is called the assumption of no-
autocorrelation (serial) among the error terms.
 The consequence of this assumption is that cov(YiYj) =0, for i ≠ j
i.e. no-autocorrelation among the Y’s.
5. Xi’s are a set of fixed values in the process of repeated sampling which
underlies the linear regression model, i.e. they are non-stochastic.
6. u is independent of the explanatory variables, i.e., cov(uiXi) = E(uiXi) = 0.
7. Variability in X values. The X values in a given sample must not all be the
same. Technically, var(X) must be a finite positive number.
8. The regression model is correctly specified.

3.2 The Least Square Criterion


Thus far we have completed the work involved in the first stage of any econometric
application, namely we have specified the model and stated explicitly its assumptions.
The next step is the estimation of the model, that is, the computation of the numerical
values of its parameters.

17
The linear relationship Yi = β1 + β2Xi + ui holds for the population of the values of X and
Y, so that we could obtain the numerical values of β1 and β2 only if we could have all the
possible values of X, Y and u which form the population of these variables. Since this is
impossible in practice, we get a sample of observed values of Y and X, specify the
distribution of the u’s and try to get satisfactory estimates of the true parameters of the
relationship. This is done by fitting a regression line through the observations of the
sample, which we consider as an approximation to the true line.
The method of ordinary least squares is one of the econometric methods which enable us
to find the estimate of the true parameter and is attributed to Carl Friedrich Gauss, a
German mathematician. To understand this method, we first explain the least squares
principle.

Recall the two-variable PRF:


Yi = β1 + β2Xi + ui ------------------------------------------------------------------ (2.2.4)

However, as noted in Chapter 2, the PRF is not directly observable. We estimate it from
the SRF:
Yi = ^β 1 + ^β 2Xi + u^ i ----------------------------------------------------------------
(2.2.6)
= Y^ i + u^ i --------------------------------------------------------------------------- (*)
Where Y^ i is the estimated (conditional mean) value of Yi

But how is the SRF itself determined? To see this, let us proceed as follows. First, express
(*) as
u^ i = Yi − Y^ i

= Yi − ^β 1 − ^β 2Xi ---------------------------------------------------------------- (3.2.1)


Which shows that the u^ i (the residuals) are simply the differences between the actual
and estimated Y values.

18
Now given n pairs of observations on Y and X, we would determine the SRF in such a
manner that it is as close as possible to the actual Y. To this end, we adopt the least-
squares criterion, which states that the SRF can be fixed in such a way that
(¿ Y i−Y^ i )
∑ u^ 2i = ∑¿
2

(¿ Y i− β^ 1− β^ 2 X i) 2
= ---------------------------------------------------- (3.2.2)
∑¿
is as small as possible, where u^ i
2
are the squared residuals.

It is obvious from (3.2.2) that ∑ u^ 2i = f ( ^β 1 , ^β 2) that is, the sum of the squared
residuals is some function of the estimators ^β 1 and ^β 2. For any given set of data,

choosing different values for ^β 1 and ^β 2 will give different u^ ’s and hence

different values of ∑ u^ 2i .

The principle or the method of least squares chooses ^β 1 and ^β 2 in such a manner

that, for a given sample or set of data, ∑ u^ 2i is as small as possible. In other words, for
a given sample, the method of least squares provides us with unique estimates of β1 and
β2 that give the smallest possible value of ∑ u^ 2i .

The process of differentiation yields the following equations for estimating β1 and β2.
Differentiating Eq. (3.2.2) partially with respect to ^β 1 and ^β 2, we obtain
2
∂( ∑ u^ i ) (¿ Y i− β^ 1− β^ 2 X i)
= -2 =0
∂ ^β1 ∑¿
∂( ∑ u^ 2i ) (¿ Y i− β^ 1− β^ 2 X i) X i
= -2 =0
∂ ^β2 ∑¿

Setting these equations to zero gives, the normal equations below


∑Yi = n ^β 1 + ^β 2 ∑ Xi -----------------------------------------------------
(3.2.3)

19
∑ X iY i = ^β 1 ∑ X i + ^β 2 ∑ X 2i ---------------------------------------------
(3.2.4)
Where n is the sample size. These simultaneous equations are known as the normal
equations.
Solving the normal equations simultaneously, we obtain

Σ X 2i ∑ Y i−∑ X i ∑ X i Y i
n ∑ Y i X i −∑ X i ∑ Y i ^β =
^β = and 1 2
2
2 2 n Σ X 2i −( ∑ X i)
n Σ X −( ∑ X i)
i

∑ ( X i – X ) (Y i−Y )
= ---- ^β = Ý - ^β X́ ---------------
∑ ( X i – X )2 (3.2.6)
1 2

(3.2.5)
∑ xi yi
=
∑ x 2i
Where X and Y are the sample means of X and Y and where we define xi= (Xi−
X ) and yi = (Yi − Y ). The above lowercase letters in the formula denote deviations
from mean values. Equation (3.2.6) can be obtained directly from (3.2.3) by simply
dividing both sides of the equation by n.

Note that, by making use of simple algebraic identities, formula (3.2.5) for estimating β 2
can be alternatively expressed as

^β =
∑ xi yi =
∑ xi Y i =
∑ Xi yi
2
∑ x 2i ∑ X 2i −n X́ 2 ∑ X 2i −n X́ 2
----------------------------------------- (3.2.7)
The estimators obtained previously are known as the least-squares estimators, for they
are derived from the least-squares principle. Write finally regression line equation as

Y^ i = ^β 1 + ^β 2Xi.
F Interpretation of estimates
 Estimated intercept, ^β 1: The estimated average value of the dependent
variable when the independent variable takes on the value zero
 Estimated slope, ^β 2: The estimated change in the average value of the
dependent variable when the independent variable increases by one unit.

20
 Y^ i gives average relationship between Y and X. i.e. Y^ i is average
value of Y given Xi.

Example 1
A random sample of ten families had the following income and food expenditure (in $ per
week)
Families A B C D E F G H I J
Family income 20 30 33 40 15 13 26 38 35 43
Family 7 9 8 11 5 4 8 10 9 10
expenditure
Estimate the regression line of food expenditure on income and interpret your results.

Note the following numerical properties of estimators obtained by the method of OLS.
1. The OLS estimators are expressed solely in terms of the observable (i.e., sample)
quantities (i.e., X and Y). Therefore, they can be easily computed.
2. They are point estimators.
3. Once the OLS estimates are obtained from the sample data, the sample regression line
can be easily obtained. The regression line thus obtained has the following properties:
3.1. It passes through the sample means of Y and X.
3.2. The mean value of estimated Y = Y^ i is equal to the actual value of Y. i.e. Y^´
= Ý
3.3. The mean value of the residuals u^ i is zero.
3.4. As a result of the preceding property, the sample regression Yi = ^β 1 + ^β 2Xi
+ u^ i can be expressed in an alternative form where both Y and X are expressed
as deviations from their mean values. i.e. yi = ^β 2 x i + u^ i . The SRF can also

be written as ^y i = ^β 2 x i , whereas in the original units of measurement it

was Y^ i = ^β 1 + ^β 2Xi. The above equations are called the deviation form.
3.5. The residuals u^ i are uncorrelated with the predicted Yi.
3.6. The residuals u^ i are uncorrelated with Xi.
3.3 Precision or Standard Errors of Least-Squares Estimates

21
It is evident that least-squares estimates are a function of the sample data. But since the
data are likely to change from sample to sample, the estimates will change ipso facto.
Therefore, what is needed is some measure of “reliability” or precision of the estimators
^β 1 and ^β 2. In statistics the precision of an estimate is measured by its standard
error (se). The standard errors of the OLS estimates can be obtained as follows:

σ2
var( ^β 2 )= ---------------------------------------------------------
∑ x 2i
(3.3.1)

σ2
se( ^β 2) = 2 ---------------------------------------------- (3.3.2)
√∑ x i

2
∑ Xi 2
var( ^β 1) = 2
σ ---------------------------------------------------
n∑ xi

∑ X 2i σ 2
se( ^β 1) =
√ n ∑ xi
2
----------------------------------------------(3.3.4)

Where var = variance and se = standard error and where σ2 is constant or homoscedastic
variance of ui of Assumption 2.

All the quantities entering into the preceding equations except σ2 can be estimated from
the data. σ2 itself is estimated by the following formula:

2 ∑ u^ 2i
σ^ =
n−2
2
Where σ^ is the OLS estimator of the true but unknown σ2 and where the expression

n −2 is known as the number of degrees of freedom (df), ∑ u^ 2i being the sum of the
residuals squared or the residual sum of squares (RSS).1

Once ∑ u^ 2i is known, σ^ 2 can be easily computed. ∑ u^ 2i it self can be computed


either from (3.2.2) or from the following expression.
∑ u^ 2i =∑ y 2i − ^β 22 ∑ x 2i ---------------------------------------------- (3.3.5)

22
Compared with Eq. (3.2.2), Eq. (3.3.5.) is easy to use, for it does not require computing
u^ i for each observation.

Since ^β 2= ∑ xi y i , an alternative expression for computing ∑ u^ 2i is


∑ x 2i
2
2 2 ( ∑ x i y i)
∑ u^ =∑ y −
i i ----------------------------------------------------------------
∑ x 2i
(3.3.6)
The term number of degrees of freedom means the total number of observations in the
sample (n) less the number of independent (linear) constraints or restrictions put on them.
In other words, it is the number of independent observations out of a total of n
observations. The general rule is this: df = (n − number of parameters estimated).
Note that the positive square root of σ^ 2

∑ u^ 2i
σ^ =
√ n−2
-------------------------------------------------------------------- (3.3.7)

is known as the standard error of estimate or the standard error of the regression
(se). It is simply the standard deviation of the Y values about the estimated regression line
and is often used as a summary measure of the “goodness of fit” of the estimated
regression line.

Note the following features of the variances (and therefore the standard errors) of ^β 1

and ^β 2 .

1 The variance of ^β 2 is directly proportional to σ


2
but inversely proportional to

∑ x 2i .
2 The variance of ^β 1 is directly proportional to σ
2
and ∑ X 2i but inversely

proportional to ∑ x 2i and the sample size n.

A Numerical Example
We illustrate the econometric theory developed so far by considering the Keynesian
consumption function discussed in the Introduction. As a test of the Keynesian

23
consumption function, we use the sample data of Table 2.2a, which for convenience is
reproduced as Table 3.2.
Table 3.2: hypothetical data on weekly family consumption expenditure Y and weekly
family income X
Y($) X($)
70 80
65 100
90 120
95 140
110 160
115 180
120 200
140 220
155 240
150 260

Table 3.3 raw data based on table 3.2


2
Yi Xi Y i Xi Xi i=¿ X i − X́ i=¿ Y i−Ý x 2i x i yi Y^ i u^ i=Y i −Y^ i
(1) (2) (3) (4 ) x ¿ (5) y ¿ (6) (7) (8) (9) (10)
810 369 65.181
70 80 5600 6400 -90 -41 4.8181
0 0 8
10 1000 490 322 75.363
65 6500 -70 -46 -10.3636
0 0 0 0 6
12 1080 1440 250 105 85.545
90 -50 -21 4.4545
0 0 0 0 0 4
14 1330 1960 95.727
95 -30 -16 900 480 -0.7272
0 0 0 2
11 16 1760 2560 105.90
-10 -1 100 10 4.0909
0 0 0 0 90
11 18 2070 3240 116.09
10 4 100 40 -1.0909
5 0 0 0 09
12 20 2400 4000 125.27
30 9 900 270 -6.2727
0 0 0 0 27
14 22 3080 4840 250 145 136.45
50 29 3.5454
0 0 0 0 0 0 45
15 24 3720 5760 490 308 145.63
70 44 8.3636
5 0 0 0 0 0 63

24
15 26 3900 6760 810 351 156.81
90 39 -6.8181
0 0 0 0 0 0 81
1109.9
Su 11 17 2055 3220 330 168 995
0 0 0
m 10 00 00 00 00 00 ≈
1110.0
Me 11 17
Nc Nc 0 0 Nc Nc 111 0
an 1 0
∑ xi yi = Ý - ^β 2 X́ ^β 1
2=

∑ x 2i = 111 − 0.5091(170)
Notes: ≈ symbolizes “approximately
= 16,800/33,000 = 24.4545
= 0.5091 equal to”; nc means “not computed.”
The raw data required to obtain the estimates of the regression coefficients, their standard
errors, etc., are given in Table 3.3. From these raw data, the following calculations are
obtained.

^β 1 = 24.4545 var ( ^β 1 ) = 41.1370 and se ( ^β 1 )=


6.4138
^β 2 = 0.5091 var ( ^β 2 ) = 0.0013 and se ( ^β 2)=
0.0357
σ^ 2 = 42.1591

The estimated regression line therefore is


Y^ i = 24.4545 + 0.5091Xi
The associated regression line are interpreted as follows: Each point on the regression
line gives an estimate of the expected or mean value of Y corresponding to the chosen X
value; that is, Y^ i is an estimate of E(Y|X i). The value of ^β 2 = 0.5091, which
measures the slope of the line, shows that, within the sample range of X between $80 and
$260 per week, as X increases, say, by $1, the estimated increase in the mean or average
weekly consumption expenditure amounts to about 51 cents. The value of ^β 1 =
24.4545, which is the intercept of the line, indicates the average level of weekly
consumption expenditure when weekly income is zero.

25
3.4 Properties of Least-Squares Estimators:
Given the assumptions of the classical linear regression model, the least-squares
estimates possess some ideal or optimum properties. These properties are contained in the
well-known Gauss–Markov theorem. An estimator, say the OLS estimators ^β 2 , is said
to be a best linear unbiased estimator (BLUE) of β2 if the following hold:
1 It is linear, that is, a linear function of a random variable, such as the dependent
variable Y in the regression model.
2 It is unbiased, that is, its average or expected value, E( ^β 2), is equal to the true
value, β2.
3 It has minimum variance in the class of all such linear unbiased estimators; an
unbiased estimator with the least variance is known as an efficient estimator.
In the regression context it can be proved that the OLS estimators ( ^β 1, ^β 2) are
BLUE.

3.5 Regression through the Origin


There are occasions when the two-variable PRF assumes the following form:
Yi = β2Xi + ui -------------------------------------------------- (3.5.1)

In this model the intercept term is absent or zero, hence the name regression through the
origin. How do we estimate models like (3.5.1)? To answer these questions, let us first
write the SRF of (3.5.1), namely, Yi = ^β 2Xi + u^ i ------------------------------------------
(3.5.2)

Now applying the OLS method to (3.5.2), we obtain the following formulas for ^β 2

and its variance var( ^β 2 )=


∑ Y i Xi ❑2 2 ∑ u^ 2i
^β = and where ❑ 2 σ^
is estimated=by
2
Σ Xi
2 Σ X 2i n−1

The differences between the two sets of formulas should be obvious: In the model with
the intercept term absent, we use raw sums of squares and cross products but in the

26
intercept present model, we use adjusted (from mean) sums of squares and cross
2
products. Second, the df for computing σ^ is (n−1) in the model without intercept and
(n−2) in the model with intercept.

3.6 Statistical Tests of Significance of Least Square Estimates

After estimation of the parameters, the next stage is to establish the criteria for judging
the goodness of the parameter estimates. As indicated in chapter one, we divide the
available criteria in to three groups: theoretical a prior criteria, statistical criteria and
econometric criteria. The theoretical criteria (sign and size of the coefficients) are set by
economic criteria and defined in the stage of the specification of the model. In this
chapter, we develop the statistical criteria for the evaluation of the parameter estimates.

The two most commonly used tests in econometrics are the following:
1. The square of the correlation coefficient, r2, which is used to judge the
explanatory power of the linear regression of Y on X.
2. The standard error of the parameter estimated and is applied for judging the
statistical reliability of the estimates of the regression coefficients ^β 1 and ^β

2 . And It provides a measure of the degree of confidence we attribute to the


estimates ^β 1 and ^β 2

1. Coefficient of Determination r 2: A Measure of “Goodness of Fit”


After the estimation of the parameters and the determination of the least squares
regression line, we need to know how good is the fit of this line to the sample
observations of Y and X, i.e. we need to measure the dispersion of the observations
around the regression line. If all the observations were to lie on the regression line, we
would obtain a “perfect” fit, but this is rarely the case. Generally, there will be some
positive u^ i and some negative u^ i. What is needed is that these residuals around the
regression line are as small as possible.

The coefficient of determination r2 (two-variable case) or R2 (multiple regression) is a


summary measure that tells how well the sample regression line fits the data. r2 shows the
percentage of the total variation of the dependent variable that can be explained by the
independent variable X.

27
To compute this r2, we proceed as follows: Recall that
Yi = Y^ i + u^ i

Or in the deviation form


yi = ^y i + u^ i

---------------------------------------------------------------------------------------- (3.6.1)

Squaring (3.6.1) on both sides and summing over the sample, we obtain
∑ y 2i =∑ ^y 2i +∑ u^ 2i + 2 ∑ ^y i u^ i
∑ y 2i =∑ ^y 2i +∑ u^ 2i -------------------------------------------------- (3.6.2)

∑ y 2i = ^β22 ∑ x 2i +∑ u^ 2i
Since ∑ ^y i u^ i = 0 (why?) and ^y i = ^β 2 x i

The various sums of squares appearing in (3.6.2) can be described as follows:


 ∑ y 2i =∑ (Y i−Ý )2 = total variation of the actual Y values about their sample
mean, which may be called the total sum of squares (TSS).
Y^
Y^
 (¿¿ i−Ý i )2= ^β 22 ∑ x 2i = variation of the estimated Y values about their mean
(¿ ¿ i−Y^´ i)2=∑ ¿
∑ ^y 2i =∑ ¿
which is called sum of squares due to regression [i.e., due to the explanatory
variable(s)], or explained by regression, or simply the explained sum of squares
(ESS).
 ∑ u^ 2i = residual or unexplained variation of the Y values about the regression
line, or simply the residual sum of squares (RSS).

Thus, (3.6.2) is TSS = ESS + RSS ------------------------------------------------ (3.6.3)

28
and shows that the total variation in the observed Y values about their mean value can be
partitioned into two parts, one attributable to the regression line and the other to random
forces because not all actual Y observations lie on the fitted line. Geometrically,

Y
Yi u^ i = due to
residual
SRF

^β +
Y i−Ý = Y^ i 1

total
Y^ i−Ý i = due to
regression

X
0 Xi

Fig. 3.1 Breakdown of the variation of Yi into two components


Now dividing (3.6.3) by TSS on both sides, we obtain
ESS RSS
1= +
TSS TSS ----------------------------------------- (3.6.4)
Y^
= ∑ (¿¿ i−Ý i )2 + ∑ u^ 2i
∑ (Y i−Ý )2 ∑ (Y i−Ý )2
¿

We now define r2 as
Y^
r2 = ∑ (¿¿ i−Ý i )2 =
ESS -------------------------------------------------------- (3.6.5)
∑ (Y i−Ý ) 2
TSS
¿

or, alternatively, as

2
r =1-
∑ u^ 2i
---------------------------------------------------------------------------------
∑ (Y i−Ý )2 (3.6.6)

RSS
= 1 - TSS

29
r2 can be computed more quickly from the following formula:
2 2 2
ESS ∑ ^y i ^β 2 ∑ x i ^ 2 ∑ x 2i
2
r = =
TSS ∑ y 2i
=
∑ y 2i
=β2
( )
∑ y 2i
------------------------------------------- (3.6.7)

If we divide the numerator and denominator of (3.7.7) by the sample size n (or n−1 if the
sample size is small), we obtain
2
^β 22 S X
r = 2
( )S 2Y
------------------------------------------------------------------------- (3.7.8)

Where S 2X and S 2Y are the sample variances of Y and X, respectively.

Since ^β 2=∑ x i y i / ∑ x 2i Eq. (3.7.7) can also be expressed as


2
2 ( ∑ xi y i )
r = --------------------------------------------------------------------------------
∑ x 2i ∑ y 2i
(3.7.9)

Two properties of r2 may be noted:


1. It is a nonnegative quantity. (Why?)
2. Its limits are 0 ≤ r2≤ 1. If r2 =1 means a perfect fit, that is, Y^ i = Yi for each i. On
the other hand, r2 = zero means that there is no relationship between the
regressand and regressors whatsoever (i.e., ^β 2 = 0).
A quantity closely related to but conceptually very much different from r2 is the
coefficient of correlation, which is a measure of the degree of association between two
variables. It can be computed either from
r=± √ r2 ---------------------------------------------------------------------------------- (3.7.10)

or from its definition


∑ xi y i
r=
√(∑ x2i )(∑ y 2i ) -------------------------------------------- (3.7.11)

n ∑ Y i X i−( ∑ X i )( ∑ Y i )
r=
√[ n∑ X 2i −(∑ X i )2][ n ∑ Y 2i −(∑ Y i )2 ]
30
Some of the properties of r are as follows:
1. It can be positive or negative, the sign depending on the sign of the term in the
numerator
2. It lies between the limits of −1 and +1; that is, −1 ≤ r ≤ 1.
3. It is symmetrical in nature; that is, the coefficient of correlation between X and Y
(rXY) is the same as that between Y and X (rYX).
4. It is a measure of linear association or linear dependence only; it has no meaning
for describing nonlinear relations like Y = X2

Example: 1 Find the value of r2 for the numerical example and interpret it?

F r2 = 0.9621 and r = 0.9809


The value of r2 of 0.9621 means that about 96 percent of the variation in the weekly
consumption expenditure is explained by income. The coefficient of correlation of 0.9809
shows that the two variables, consumption expenditure and income, are highly positively
correlated.
2) Testing the significance of a given regression coefficient
Since the sample values of the intercept and the coefficient are estimates of the true
population parameters, we have to test them for their statistical reliability.
The significance of a model can be seen in terms of the amount of variation in the
dependent variable that it explains and the significance of the regression coefficients.
There are different tests that are available to test the statistical reliability of the parameter
estimates. The following are the common ones;
A) The standard error test
B) The standard normal test
C) The students t-test
Now, let us discuss them one by one.
A) The Standard Error Test
This test first establishes the two hypotheses that are going to be tested which are
commonly known as the null and alternative hypotheses. The null hypothesis addresses

31
that the sample is coming from the population whose parameter is not significantly
different from zero while the alternative hypothesis addresses that the sample is coming
from the population whose parameter is significantly different from zero. The two
hypotheses are given as follows:
H0: βi=0
H1: βi≠0
The standard error test is outlined as follows:
1. Compute the standard deviations of the parameter estimates using the above formula
for variances of parameter estimates. This is because standard deviation is the positive
square root of the variance.

∑ X 2i σ 2
se( ^β 1 )=
√ n ∑ x 2i

σ2
se( ^β )=
2 2
√∑ x i

2. Compare the standard errors of the estimates with the numerical values of the estimates
and make decision.
A) If the standard error of the estimate is less than half of the numerical value of the
estimate, we can conclude that the estimate is statistically significant. That is, if

^ 1
P=+ , reject the null hypothesis and we can conclude that the estimate is
statistically significant.
B) If the standard error of the estimate is greater than half of the numerical value of the

estimate, the parameter estimate is not statistically reliable. That is, if


^
P=−1 ,
conclude to accept the null hypothesis and conclude that the estimate is not statistically
significant.

B) The Standard Normal Test


This test is based on the normal distribution. The test is applicable if:

32
 The standard deviation of the population is known irrespective of the sample size
 The standard deviation of the population is unknown provided that the sample
size is sufficiently large (n>30).
The standard normal test or Z-test is outline as follows;

1. Test the null hypothesis


εt =Pε t−1+U t against the alternative hypothesis
Y t∗¿b0∗+b1 X 1t∗+...+bK XKt∗+Ut
Y t∗¿Y t− P^ t−1
X it∗¿ Xit −PXi (t−1)

2. Determine the level of significant ( V t=U t−PUt−1


b0∗¿b 0−Pb 0 in which the test is carried out. It is the
probability of committing type I error, i.e. the probability of rejecting the null
hypothesis while it is true. It is common in applied econometrics to use 5% level
of significance.
3. Determine the theoretical or tabulated value of Z from the table. That is, find the
d
value of d≈2(1−P)^ from the standard normal table. P=1
^ −
2 from the table.
4. Make decision. The decision of statistical hypothesis testing consists of two
decisions; either accepting the null hypothesis or rejecting it.

^
β
If 0 , accept the null hypothesis while if S T =S E+ S R , reject the null
hypothesis. It is true that most of the times the null and alternative hypotheses are
mutually exclusive. Accepting the null hypothesis means that rejecting the
alternative hypothesis and rejecting the null hypothesis means accepting the
alternative hypothesis.
∑e e
P^ = t t−1
Example: If the regression has a value of =29.48 and the standard error of ∑e 2
t−1 is
Y t=P^ Y t−1=β0 (1−P^ )+β1 [ X1t−ϑ^ X1t−1]+Ut∗¿
36. Test the hypothesis that the value of ¿ at 5% level of significance using
standard normal test.
Solution: We have to follow the procedures of the test.

^ ^
et=Y t− β^ 0− β^ 1 X 1t

33
After setting up the hypotheses to be tested, the next step is to determine the level of
significance in which the test is carried out. In the above example the significance level is
given as 5%.
The third step is to find the theoretical value of Z at specified level of significance. From
^
^=
ϑ
∑ e t e t −1
the standard normal table we can get that ∑ et − 1 2 .
The fourth step in hypothesis testing is computing the observed or calculated value of the
standard normal distribution using the following formula.
^^ ^^ x
Y t − PY t−1=β 0 (1− P)+β 1 ( X t − ϑX t−1 )+U t ∗ ¿
¿ . Since the calculated value of the test statistic is
less than the tabulated value, the decision is to accept the null hypothesis and conclude
that the value of the parameter is 25.
C) The Student t-Test
In conditions where Z-test is not applied (in small samples), t-test can be used to test the
statistical reliability of the parameter estimates. The test depends on the degrees of
freedom that the sample has. The test procedures of t-test are similar with that of the z-
test. The procedures are outlined as follows;
1. Set up the hypothesis. The hypotheses for testing a given regression coefficient is
given by:
^ ^ ^
Yt−PY^ t−1=β0(1−P)^ +β1( Xt−ϑX^ t−1)+Ut∗x¿
¿
2. Determine the level of significance for carrying out the test. We usually use a 5% level
significance in applied econometric research.
3. Determine the tabulated value of t from the table with n-k degrees of freedom, where k
is the number of parameters estimated.
4. Determine the calculated value of t. The test statistic (using the t- test) is given by:

Ut=ϑU t−1+V t
The test rule or decision is given as follows:

34
(Yt−PY t−1)=β0(1−P)+β1(X1t−PX1t−1)+. .+ βK (XKt PXt−1)+Ut∗¿
Reject H0 if ¿

iii) Confidence Interval Estimation of the regression Coefficients

We have discussed the important tests that that can be conducted to check model and
parameters validity. But one thing that must be clear is that rejecting the null hypothesis
does not mean that the parameter estimates are correct estimates of the true population
parameters. It means that the estimate comes from the sample drawn from the population
whose population parameter is significantly different from zero. In order to define the
range within which the true parameter lies, we must construct a confidence interval for
the parameter. Like we constructed confidence interval estimates for a given population
mean, using the sample mean (in Introduction to Statistics), we can construct 100(1- )
% confidence intervals for the sample regression coefficients. To do so we need to have
the standard errors of the sample regression coefficients. The standard error of a given
coefficient is the positive square root of the variance of the coefficient. Thus, we have
discussed that the formulae for finding the variances of the regression coefficients are
given as.

β 0 ( 1− P )= a0
β 1= a1
β 1 ϑ =a2
Variance of the intercept is given by etc . 3.7.12

Y =a +PY +a X +. +a X +V
Variance of the slope t 0 t−1 1 1t K Kt t is given by
Y t −1
3.7.13

Where, δ 2=
∑ u2 (3.7.13) is the estimate of the variance of the random term and k
n−k
is the number of parameters to be estimated in the model. The standard errors are the
positive square root of the variances and the 100(1- ) % confidence interval for the
slope is given by:
( Y t − PY t − 1 )= Y t ∗¿
¿ ( X 1 t − ϑX 1 t −1 )= X 1 t∗
. ..
( X Kt − PX Kt − 1 )= X Kt ∗

35
Y t∗¿β 0+β1 X 1t∗+...+β K X Kt∗+V t 3.7.14
And for the intercept:

Γx i x j=1
Example 2.6: The following table gives the quantity supplied (Y in tons) and its price (X
pound per ton) for a commodity over a period of twelve years.
Table 3: Data on supply and price for given commodity

Yi 69 76 52 56 57 77 58 55 67 53 72 64
Xi 9 12 6 10 9 10 7 8 12 6 11 8

Table 4: Data for computation of different parameters

Tim xi
e Yi Xi XiYi Xi2 Yi2 xi yi xiyi 2 yi2 Γxix j=0 ui ui2
1 69 9 621 81 4761 0 6 0 0 36 63.00 6.00 36.00
16
2 76 12 912 144 5776 3 13 39 9 9 72.75 3.25 10.56
- - 12
3 52 6 312 36 2704 3 11 33 9 1 53.25 -1.25 1.56
- 105.0
4 56 10 560 100 3136 1 -7 -7 1 49 66.25 10.25 6
5 57 9 513 81 3249 0 -6 0 0 36 63.00 -6.00 36.00
19 115.5
6 77 10 770 100 5929 1 14 14 1 6 66.25 10.75 6
-
7 58 7 406 49 3364 2 -5 10 4 25 56.50 1.50 2.25
-
8 55 8 440 64 3025 1 -8 8 1 64 59.75 -4.75 22.56
9 67 12 804 144 4489 3 4 12 9 16 72.75 -5.75 33.06
- - 10
10 53 6 318 36 2809 3 10 30 9 0 53.25 -0.25 0.06
11 72 11 792 121 5184 2 9 18 4 81 69.50 2.50 6.25
-
12 64 8 512 64 4096 1 1 -1 1 1 59.75 4.25 18.06

36
75 10 696 102 4852 15 4 89 756.0 387.0
Sum 6 8 0 0 2 0 0 6 8 4 0 0.00 0
Mea
n 63 9 63

Use Tables (Table and Table ) to answer the following questions


1. Estimate the Coefficient of determination (r2)
2. Run significance test of regression coefficients using the following test methods
A) The standard error test
B) The students t-test
3. Fit the linear regression equation and determine the 95% confidence interval for the
slope.
Solution
1. Estimate the Coefficient of determination (r2)
Refer to Example 2.6 above to determine how much percent of the variations in the
quantity supplied is explained by the price of the commodity and what percent remained
unexplained.
Use data in Table to estimate r2 using the formula given below.

2
r =1-
∑ u 2i 387
=1- 894 =1−0.43=0.57
∑ y 2i
This result shows that 57% of the variation in the quantity supplied of the commodity
under consideration is explained by the variation in the price of the commodity; and the
rest 37% remain unexplained by the price of the commodity. In other word, there may be
other important explanatory variables left out that could contribute to the variation in the
quantity supplied of the commodity, under consideration.
2. Run significance test of regression coefficients using the following test methods

Fitted regression line for the data given is:


E( β^ )=β
Where the numbers in parenthesis are standard errors of the respective coefficients.

37
A. Standard Error test
In testing the statistical significance of the estimates using standard error test, the
following information needed for decision.
Since there are two parameter estimates in the model, we have to test them separately.

Testing for 1
2 2
We have the following information about
β1+β2=1 i.e.
H0:βi=0 =3.25 and
( Γxi x j ) ≥R
The following are the null and alternative hypotheses to be tested.

ΔX t=Xt−Xt−1

∑ ^y2=135.0262
Since the standard error of is less than half of the value of 1 , we have to reject the

null hypothesis and conclude that the parameter estimate 1 is statistically significant.


Testing for 0


Again we have the following information about 0
 
 0 33.75 and se(  0 ) 8.3

The hypotheses to be tested are given as follows;

H 0 :  0 0
H 1 :  0 0

X , X ,. . X Y=β+ X+β X +. +β X +u
Since the standard error of 1 2 k is less than half of the numerical value of i 0 1 2 2 k i , we have

to reject the null hypothesis and conclude that


β 0= is statistically significant.
B. The students t-test
In the illustrative example, we can apply t-test to see whether price of the commodity is
significant in determining the quantity supplied of the commodity under consideration?
Use =0.05.
The hypothesis to be tested is:

38
Yi
The parameters are known.
β i

Then we can estimate tcal as follows

ui
Further tabulated value for t is 2.228. When we compare these two values, the calculated
t is greater than the tabulated value. Hence, we reject the null hypothesis. Rejecting the
null hypothesis means, concluding that the price of the commodity is significant in
determining the quantity supplied for the commodity.
In this part we have seen how to conduct the statistical reliability test using t-statistic.
Now let us see additional information about this test. When the degree of freedom is
large, we can conduct t-test without consulting the t-table in finding the theoretical value
of t. This rule is known as “2t-rule”. The rule is stated as follows;
The t-table shows that the values of t changes very slowly if the degrees of freedom (n-k)

are greater than 8. For example the value of


β1 changes from 2.30 (when n-k=8) to
1.96(when n-k=∞). The change from 2.30 to 1.96 is obviously very slow. Consequently,
we can ignore the degrees of freedom (when they are greater than 8) and say that the

theoretical value of
Y i is 2.0. Thus, a two tail test of a null hypothesis at 5% level of
significance can be reduced to the following rules.

1. If
X 1 is greater than 2 or less than -2, we reject the null hypothesis

2. If
β 2 is less than 2 or greater than -2, accept the null hypothesis.

3. Fit the linear regression equation and determine the 95% confidence interval for the
slope.

39
Fitted regression model is indicated
Yi , where the numbers in
parenthesis are standard errors of the respective coefficients. To estimate confidence
interval we need standard error which is determined as follows
2
2 ∑ ui 387
δ = = =38.7
n−k 10

δ2 38.7
var ( β^ 1 )= 2
= =0.80625
∑ x 48
The standard error of the slope is
β 0

The tabulated value of t for degrees of freedom 12-2=10 and /2=0.025 is 2.228.
Hence the 95% confidence interval for the slope is given by:

X 1 ∧ X 2 . The result
tells us that at the error probability 0.05, the true value of the slope coefficient lies
between 1.25 and 5.25.

CHAPTER 4: MULTIPLE REGRESSIONS


The two-variable model studied extensively in the previous chapters is often inadequate
in practice. In our consumption–income example, for instance, it was assumed implicitly
that only income X affects consumption Y. But economic theory is seldom so simple for,
besides income, a number of other variables are also likely to affect consumption
expenditure. An obvious example is wealth of the consumer. Therefore, we need to

40
extend our simple two-variable regression model to cover models involving more than
two variables. Adding more variables leads us to the discussion of multiple regression
models, that is, models in which the dependent variable, or regressand, Y depends on two
or more explanatory variables, or regressors. The simplest possible multiple regression
model is three-variable regression, with one dependent variable and two explanatory
variables.

In this chapter we shall extend the simple linear regression model to relationships with
two explanatory variables and consequently to relationships with any number of
explanatory variables.

4.1 Models with Two Explanatory Variables


4.1.1 The normal equations
The population regression model with two explanatory variables is given as
s
Y i=β 1 +β 2 X 2 i +yβ 3 X 3 i+u i , i = 1, 2…, n -------------------------------- (4.1.1)

Systematic Random
component component
 β1 is the intercept term which gives the average values of Y when X 2 and
X 3 are zero.
 β2 and β3 are called the partial slope coefficient, or partial regression
coefficients.
 β 2 measures the change in the mean value of Y resulting from a unit change in

the X2 given X 3 (i.e. holding the value of X 3 constant). Or equivalently


β 2 measures the direct or net effect of a unit change in X 2 on the mean

value of Y( net of any effect that X 3 may have on the mean of Y). The
interpretation of β 3 is also similar.

To complete the specification of our simple model we need some assumptions about the
random variable u. These assumptions are the same as in the single explanatory variable
model developed in chapter 3. That is:
 Zero mean value of ui , or E( ui | X 2 i , X 3 i ) = 0 for each i

41
 No serial correlation, or cov( ui , u j ) = 0 where i ≠ j
 Homoscedasticity, or var( ui ) = σ 2
 Normality of ui i.e ui ∼ N(0, σ 2 )
 Zero covariance between ui and each X variable, or cov( ui , X 2 i ) = cov(
ui , X 3 i ) = 0
 No specification bias, or the model is correctly specified
 No exact collinearity between the X variables, or no exact linear relationship
between X 2 and X3
For notational symmetry, Eq. (5.1.1) can also be written as
Y i=β 1 X 1i + β 2 X 2 i+ β3 X 3 i +ui with the provision that X 1 i = 1 for all i.
The assumption of no collinearity is a new one and means the absence of possibility of
one of the explanatory variables being expressed as a linear combination of the other.
Existence of exact linear dependence between X 2 i and X 3 i would mean that we
have only one independent variable in our model than two. If such a regression is

β
estimated there is no way to estimate the separate influence of (¿¿ 2) and X 3 (β 3 )
X2¿
on Y, since such a regression gives us only the combined influence of X 2 and X 3 on
Y.
To see this suppose X 3=2 X 2 then
Y i=β 1 +β 2 X 2 i + β 3 X 3 i+u i
Y i=β 1 + β 2 X 2 i + β 3 ( 2 X 2i ) +u i
Y i=β 1 + ( β 2+ 2 β 3 ) X 2 i +ui
Y i=β 1 +α X 2i +ui , where α = ( β 2+ 2 β 3 )
Estimating the above regression yields the combined effect of X2 and X3 as

represented by α =( β 2 +2 β 3 ) where there is no possibility of separating their individual


effects which are represented by β 2 and β3 .

This assumption does not guarantee there will not be correlations among the explanatory
variables; it only means that the correlations are not exact or perfect, as it is not

42
impossible to find two or more (economic) variables that may not be correlated to some
extent. Likewise the assumption does not guarantee absence of non-linear relationships
among X’s either.

Having specified our model we next use sample observations on Y, X2 and X3 and
obtain estimates of the true parameters β 1 , β 2 and β 3 :

Y^ i= β^ 1 + ^β 2 X 2 i + ^β 3 X 3 i
where ^β 1 , ^β 2 , ^β 3 are estimates of the true parameters β1 , β 2 and β 3 of
the relationship.

As before, the estimates will be obtained by minimizing the sum of squared residuals
2
∑ u^ 2i =∑ (Y i −Y^ i)2=∑ (Y i−( ^β 1+ ^β2 X 2 i + ^β 3 X 3 i ))
A necessary condition for this expression to assume a minimum value is that its partial
derivatives with respect to ^β 1 , ^β 2 , and ^β 3 be equal to zero:
2
∂ ∑ (Y i−( ^β 1+ β^ 2 X 2 i+ ^β3 X 3 i ) )
∂ β^ 1
2
∂ ∑ (Y i−( ^β 1+ β^ 2 X 2 i+ ^β3 X 3 i ) )
∂ β^ 2
2
∂ ∑ (Y i−( ^β 1+ β^ 2 X 2 i+ ^β3 X 3 i ) )
∂ β^ 3
Performing the partial differentiations we get the following system of three normal
equations in three unknown parameters ^β 1 , ^β 2 , and ^β 3

∑ Y i=n ^β1 + ^β 2 ∑ X 2 i+ ^β3 ∑ X 3 i


∑ X 2i Y i= β^ 1 ∑ X 2 i+ ^β2 ∑ X 22 i + ^β 3 ∑ X 2 i X 3 i ----------- (4.1.2)
∑ X 3i Y i= β^ 1 ∑ X 3 i + ^β 2 ∑ X 2 i X 3 i + ^β 3 ∑ X 23i
From the solution of this system (by any method, for example using determinants) we
obtain values for ^β 1 , ^β 2 , and ^β 3 .

Also by solving the system of normal equations


∑ x 2 i y i= ^β2 ∑ x 22i + β^ 3 ∑ x 2i x3 i
--------------------------------------------
(4.1.3)
43
∑ x 3 i y i = ^β2 ∑ x 2i x 3 i+ β^ 3 ∑ x 23i
The following formulae, in which the variables are expressed in deviations from their
mean, may be obtained for estimating the values of the parameter estimates.

^β 1=Ý − β^ 2 X́ 2− β^ 3 X́ 3

x2 i y i
∑¿ -------------------------------------
¿ (4.1.4)
2
(∑ 3 i ) ∑ 3 i y i )(∑ x 2 i x 3 i)
x −( x
¿
^β 2=¿

x3 i yi
∑¿
¿
2
(∑ x 2 i )−(∑ x 2 i y i )(∑ x 2 i x 3 i)
¿
^β 3=¿

Where y i=Y i−Ý , x 2i =X 2 i− X́ 2 and x 3i =X 3 i− X́ 3

4.1.2 The coefficient of multiple determination (squared multiple correlation


2
coefficient) R
In the two-variable case we saw that r 2 measures the goodness of fit of the regression
equation; that is, it gives the proportion or percentage of the total variation in the
dependent variable Y explained by the (single) explanatory variable X. This notation of
R2 can be easily extended to regression models containing more than two variables. Thus,
in the three variable model we would like to know the proportion of the variation in Y
explained by the variables X2 and X3 jointly. The quantity that gives this information is
known as the multiple coefficient of determination and is denoted by R2; conceptually
it is akin to r2.

2∑ ^y 2 ∑ ( Y^ − Ý )2 ∑ 2
u^ i
R= = =1−
∑ y 2 ∑ (Y − Ý )2 ∑ y2

44
^β 2 ∑ y i x2 i + ^β 3 ∑ y i x 3 i
R 2= −−−−−−−−−−−−(4.1 .5)
∑ y 2i
The value of R2 lies between 0 and 1. The higher R 2 the greater the percentage of the
variation of Y explained by the regression plane, that is, the better the ‘goodness of fit’ of
the regression plane to the sample observations. The closer R2 to zero, the worse the fit.
4.1.3 The mean and variance of the parameter estimates ^β 1 , ^β 2 , and ^β 3

The mean of the estimates of the parameters in the three-variable model is derived in the
same way as in the two-variable model. The estimates ^β 1 , ^β 2 , and ^β 3 are
unbiased estimates of the true parameters of the relationship between Y, X2 and X3: their
mean expected value is the true parameter itself.
E ( ^β 1 )= ^β1 E ( ^β 2 )= ^β2 E ( ^β 3 ) = ^β3
The variance of the parameter estimates are obtained by the following formulae
2 2 2 2
1 X́ ∑ x 3 i+ X́ 3 ∑ x2 i−2 X́ 2 X́ 3 ∑ x 2i x 3 i
[
var ( ^β1 ) =σ^ 2 + 2
n (∑ x 2 )(∑ x 2 )−(∑ x x )
2i 3i
2
2i 3i
]
var ( ^β2 ) =σ^ 2
∑ x 23 i
2
( ∑ x 22 i)( ∑ x 23 i)−(∑ x 2i x 3 i)

var ( ^β3 ) =σ^ 2


∑ x 22 i
2
(∑ x 22 i )( ∑ x 23 i)−(∑ x 2i x 3 i)
where σ^ 2 =∑ u^ 2i /(n−K ) , K being the total number of parameters which are
estimated. In the three-variable model K =3.

4.2 The General Linear Regression Model


In this section we will extend the method of least squares to models including any
number � of explanatory variables. There are some rule of thumb by which we can derive
(a) the normal equations, (b) the coefficients of multiple determination, (c) the variances
of the coefficients, for relationships including any number of explanatory variables.

4.2.1 Derivations of the normal equations


The general linear regression model with � explanatory variables is of the form
Y i=β 1 + β 2 X 2 i + β 3 X 3 i+ …+ β K X Ki +u i

45
There are K parameters to be estimated (K = �+1). Clearly the system of normal
equations will consist of K equations, in which the unknowns are the parameters ^β 1 ,

^β 2 , ^β 3 , …, ^β K , and the known terms will be the sums of squares and the sums

of products of all the variables in the structural equation.

In order to derive the K normal equations without the formal differentiation procedure,
we start from the equation of the estimated relationship
Y i= β^ 1 + ^β 2 X 2 i +…+ β^ K X Ki + u^ i
and we make use of the assumptions
∑ u^ i =0 and ∑ ui X j=0 where (j = 1, 2, 3, …, K)

The normal equations for a model with any number of explanatory variables may be
derived in a mechanical way, without recourse to differentiation. We will introduce a
practical rule of thumb, derived by inspection of the normal equations of the two-variable
and the three-variable models. We begin by rewriting these normal equations.
1. Model with one explanatory variables
Structural form Y i=β 1 + β 2 X 2 i +ui
Estimated form Y i= β^ 1 + ^β 2 X 2 i + u^ i

∑ Y i=n ^β1 + ^β 2 ∑ X 2 i
Normal equations ∑ X 2i Y i= β^ 1 ∑ X 2 i+ ^β2 ∑ X 22 i

2. Models with two explanatory variables


Structural form Y i=β 1 + β 2 X 2 i + β 3 X 3 i+u i
Estimated form Y i= β^ 1 + ^β 2 X 2 i + ^β 3 X 3 i+ u^ i

∑ Y i=n ^β1 + ^β 2 ∑ X 2 i+ ^β3 ∑ X 3 i


Normal equations ∑ Y i X 2 i= β^ 1 ∑ X 2 i+ ^β2 ∑ X 22 i + ^β 3 ∑ X 2 i X 3 i
Comparing the normal equations of the above models, we can generalize the procedure
to find the Kth equation of the normal equations for the K-variable model which may be
obtained by multiplying the estimated form of the K-variable model by X Ki and then
summing over all sample observations. The estimated form of the model is

46
Y i= β^ 1 + ^β 2 X 2 i + ^β 3 X 3 i+ …+ ^β K X Ki + u^ i

Multiplication through by X Ki yields


2
Y i X Ki = ^β 1 X Ki + β^ 2 X 2 i X Ki + β^ 3 X 3 i X Ki + …+ ^β K X Ki + u^ i X Ki
and summation over the n sample observation gives the required Kth equation
∑ Y i X Ki= β^ 1 ∑ X Ki + ^β 2i ∑ X 2i X Ki + ^β 3 ∑ X 3 i X Ki + …+ ^β K ∑ X 2Ki
given that by assumption ∑ u^ i X Ki=0

The generalization of the linear regression model with the variables expressed in
deviations from their means is the same. Thus the estimated form of the K-variable model
in deviation form is
y i= ^β 2 x 2 i + ^β 3 x 3 i+ …+ ^β K x Ki + u^ i

The Kth equation is derived by multiplying through the estimated form by x Ki and
summing over all the sample observations
∑ y i x Ki = ^β 2 ∑ x 2 i x Ki + ^β3 ∑ x 3 i x Ki +…+ ^β K ∑ x 2Ki
4.2.2 Generalization formula for R2
The generalization formula of the coefficient of multiple determination may be derived
by inspection of the formulae of R2 for the two-variable and three-variable models.
1. Model with one explanatory variable
β^ 2 ∑ y i x 2 i
R2Y . X =
2
∑ y 2i
2. Model with two explanatory variables
^β2 ∑ y i x2 i + ^β 3 ∑ y i x 3i
R2Y . X X =
2 3
∑ y 2i
By inspection we see that for each additional explanatory variable the formula of the
squared multiple correlation coefficient includes an additional term in the numerator,
formed by the estimate of the parameter corresponding to the new variable multiplied by
the sum of products of the deviations of the new variable and the dependent one. For
example, the formula of the coefficient of multiple determination for the K-variable
model is

47
^β 2 ∑ y i x 2i + β^ 3 ∑ y i x 3 i +…+ ^β K ∑ y i x K
R2Y . X … X =
2 K
∑ y 2i
2
4.2.3 The adjusted coefficient of determination: Ŕ
The inclusion of additional explanatory variables in the function can never reduce the
coefficients of multiple determination and will usually raise it. By introducing a new
repressor we increase the value of the numerator of the expression for R 2, while the
denominator remain the same ( ∑ y 2i the total variations of Yi is given in any
particular sample).

To correct for this defect we adjust R2 by taking into account the degrees of freedom,
which clearly decrease as new repressors are introduced in the function. The expression
for the adjusted coefficient of multiple determination is
n−1
Ŕ2=1−(1−R 2)
n−K
Or

∑ u2i /(n−K )
2
Ŕ =1−
[ ∑ y 2i /(n−1) ]
Where R2 is the unadjusted multiple correlation coefficient, n is the number of sample
observations and K is the number of parameters estimated from the sample.
If n is large Ŕ2 and R2 will not differ much. But with small samples, if the number of
repressors (X’s) is large in relation to the sample observations, Ŕ2 will be much
2
smaller than R2 and can even assume negative values, in which case Ŕ should be
interpreted as being equal to zero.

4.2.4 Generalization formulae of variances of parameter estimates


The generalization of the formulae of the variances of the parameter estimates is
facilitated by the use of determinants. In the preceding sections we have developed the
formulae of the variances of the estimates for models with one and two explanatory
variables.
1. Model with one explanatory variable
2 1
var ( ^β2 ) =σ
∑ x22 i

48
2. Model with two explanatory variables
x2 i x3 i
∑¿
¿
¿
( ∑ x 22 i )(∑ x 23 i )−¿
2
^ 2 ∑ x3 i
var ( β2 ) =σ ¿
x2 i x3 i
∑¿
¿
¿
( ∑ x 2 i )(∑ x 23 i )−¿
2

2
^ 2 ∑ x2 i
var ( β3 ) =σ ¿

The above expressions may be written in the form of determinants as follows. The normal
equations of the model with two explanatory variables, written in deviation form, are
x2 i x3 i
∑¿
( ∑ x 2 i y i ) = ^β 2 (∑ x 22i )+ ^β3 ¿
x2 i x3 i
∑¿
¿
( ∑ x 3 i yi ) = ^β 2 ¿
The terms in the parentheses are the ‘knows’ which are computed from the sample
observations, while ^β 2 and ^β 3 are the unknowns. The known terms appearing on
the right-hand side may be written in the form of a determinant

∑ x22 i ∑ x 2 i x 3 i = A
∑ x 2 i x 3 i ∑ x23 i 
The variance of each parameter is the product of σ2 multiplied by the ratio of the
minor determinant1 associated with this parameters divided by the (complete)
determinant.

49
Thus
x2 i x3 i
∑¿
¿
¿
(∑ x 22 i )(∑ x 23 i ) −¿
∑ x 22 i ∑ x2 i x3 i
^

var ( β2 ) =σ .
x x
2 ∑ 2i 3i ∑ x 23 i 
=σ 2
.
∑ x 23 i =σ
2
2 ∑ x3 i
.
¿
∑ x 22 i ∑ x2 i x3 i ∑ x 22 i ∑ x 2i x 3 i
 ∑ x2 i x3 i ∑ x 23 i   
∑ x 2i x 3 i ∑ x 23 i

x2 i x3 i
∑¿
¿
¿
(∑ 2 i )(∑ x 23 i )−¿
x 2

∑ x 22 i ∑ x2 i x3 i
2
va r ( ^β3 ) =σ .
∑ x2 i x3 i ∑ x 23 i =σ 2
.
∑ x 22 i =σ
2
2 ∑ x2 i
.
¿
∑ x 22 i ∑ x2 i x3 i ∑ x 22 i ∑ x 2i x 3 i
 ∑ x2 i x3 i ∑ x 23 i  
∑ x 2i x 3 i ∑ x 23 i

Examing the above expressions of the variances of the coefficient estimates we may
generalize as follows. The variances of the estimates of the model including �-
explanatory variables can be computed by the ratio of two determinants: the determinant
appearing in the numerator is the minor formed after striking out the row and column of
the terms corresponding to the coefficient whose variance is being computed; the
determinant appearing in the denominator is the complete determint of the known terms
appearing on the rihgt-hand side of the normal equations. For example the variance of
^β K is given by the following expression.

50
∑ x22 i ∑ x2 i x3 i … ∑ x 2i x Ki

var ( ^β K ) =σ 2 .

∑ x 2 i x 3 i ∑ x 23 i … ∑ x3 i x Ki

∑ x2 i x Ki

∑ x 3 i x Ki
∑ x22 i ∑ x2 i x3 i … ∑ x 2i x Ki

∑ x 2Ki


∑ x 2 i x 3 i ∑ x 23 i … ∑ x3 i x Ki

∑ x2 i x Ki

∑ x 3 i x Ki

∑ x 2Ki

Example 1
The table below contains observations on the quantity demanded (Y) of a certain
commodity, its price (X2) and consumers’ income (X3). Fit a linear regression to these
observations and test the overall goodness of fit (with R2) as well as the statistical
reliability of the estimates ^β 1 , ^β 2 , ^β 3 .
Quantity 100 75 80 70 50 65 90 100 110 60
demanded
Price 5 7 6 6 8 7 5 4 3 9
Income 1,000 600 1,200 500 300 400 1,300 1,100 1,300 300

4.3 Hypothesis Testing in Multiple Regression:


Once we go beyond the simple world of the two-variable linear regression model,
hypothesis testing assumes several interesting forms, such as the following:
1. Testing hypotheses about an individual partial regression coefficient
2. Testing the overall significance of the estimated multiple regression model, that
is, finding out if all the partial slope coefficients are simultaneously equal to zero
Hypothesis testing about individual regression coefficients
The procedure for testing the significance of the partial regression coefficients is the same
as that discussed for the two-variable case, i.e. we just use the t-test (or Z-test) to test a
hypothesis about any individual partial regression coefficient.

Assuming ui ∼ N (0, σ 2) , the estimators ^β 2 , ^β 3 , and ^β 1 are BLUE and


normally distributed with means equal to true β2 , β 3 , and β 1 and the variances

51
given in section 4.2.4. Furthermore, (n−3) σ^ 2 / σ 2 follows the χ2 distribution with
n−3df.
Upon replacing σ 2 by its unbiased estimator σ^ 2 in the computation of the standard
errors, each of the following variable.
β^ 1−β 1
t=
se ( β^ 1 )

β^ 2− β2
t=
se( β^ 2 )

β^ 3− β3
t=
se( ^β ) 3

Follows the t distribution with n−3df.


Therefore, the t distribution can be used to establish confidence intervals as well as test
statistical hypotheses about the true population partial regression coefficients. Similarly,
the χ2 distribution can be used to test hypotheses about the true σ 2 .

To illustrate the procedure consider the following test


H 0 : βi =0

H 1 : β i ≠0 i = 1, 2… K
^β i
t= ∼ t n− K , K = � + 1 = number of variables. Similarly the 100(1-�) % level of
se( ^β )
i

confidence interval for β i will be given by β i= ^β i ± t α / 2, n−K se ( ^βi ) ∀ i ∈ [ 1, K ]

Example 2
A production function is estimated as
4.0 0.7 X 2 0.2 X 3 R2=0.86
Y^ = + +
(0.78) (0.102) ( 0.102) n=23
Where X 2 = labor, X 3 = capital, and Y = output

Test the hypothesis β 2=0 , β 3=0 at � = 5% using the test of significance and
confidence interval approach

52
Testing the overall significance of the sample regression

Throughout the previous section we were concerned with testing the significance of the
estimated partial regression coefficients individually, that is, under the separate
hypothesis that each true population partial regression coefficient was zero. But now
consider the following hypothesis:
H 0 : β2= β3 =0
This null hypothesis is a joint hypothesis that β2 and β3 are jointly or
simultaneously equal to zero. A test of such a hypothesis is called a test of the overall
significance of the observed or estimated regression line, that is, whether Y is linearly
related to both X2 and X 3 . Can the joint hypothesis given above be tested by

testing the significance of ^β 2 and ^β 3 individually as in the previous section? The


answer is no, and the reasoning is as follows.

In testing the individual significance of an observed partial regression coefficient, we


assume implicitly that each test of significance was based on a different (i.e.,
independent) sample. But to test a joint hypothesis, if we use the same sample data, we
shall be violating the assumption underlying the test procedure.

In other words, although the statements


P [ ^β2 −t α /2 se ( β^ 2 )≤ β 2 ≤ ^β2 +t α / 2 se ( ^β 2) ] =1−α

P [ ^β3 −t α /2 se ( β^ 3 )≤ β 3 ≤ ^β 3+ t α /2 se ( ^β 3) ]=1−α
are individually true, it is not true that the probability that the intervals

[ β^ ± t
2 α/2 se ( ^β2 ) , ^β 3 ± t α / 2 se ( ^β3 ) ]
Simultaneously include β2 and β3 is (1−α )2 , because the intervals may not be
independent when the same data are used to derive them. To state the matter differently,
testing a series of single [individual] hypotheses is not equivalent to testing those same
hypotheses jointly. The intuitive reason for this is that in a joint test of several hypotheses
any single hypothesis is “affected’’ by the information in the other hypotheses.

53
The upshot of the preceding argument is that for a given example (sample) only one
confidence interval or only one test of significance can be obtained. How, then, does one
test the simultaneous null hypothesis that β2 = β 3 = 0? The answer follows.

The analysis of variance approach to testing the overall significance of an observed


multiple regression: the F test
For reasons just explained, we cannot use the usual t test to test the joint hypothesis that
the true partial slope coefficients are zero simultaneously. However, this joint hypothesis
can be tested by the analysis of variance (ANOVA) technique.
In Chapter 3, we developed the following identity:
∑ y 2i =∑ ^y 2i +∑ u^ 2i
That is, TSS = ESS +RSS, which decomposed the total sum of squares (TSS) into two
components: explained sum of squares (ESS) and residual sum of squares (RSS). A study
of these components of TSS is known as the analysis of variance (ANOVA) from the
regression viewpoint. In this technique, we test the significance of the ESS or the null
hypothesis
H 0 : βi =0 .

Under the assumptions of the regression model ui N (0, σ 2)


2
RSS ∑ u^ i 2
= 2 χ (n−K and
σ2
)
σ
2
ESS ∑ ^y i 2
2
= 2
χ (K −1)
σ σ

Further the two chi-square distributions are independent and thus under the null
hypothesis H 0 : βi =0
2
χ( K−1) / K−1 ESS / K−1
F= 2
= F (K−1,n− K)
χ (n−K ) /n−K RSS / n−K

What use can be made of the preceding F ratio? Let us take the two variable case
^β 22 ∑ x 2i
F=
∑ u^ 2i /(n−2)

54
2 2
β^ 2 ∑ xi
F= 2
σ^
It can be shown that
E ( ^β 22 ∑ x 2i ) =σ 2 + β 22 ∑ x2i
and

∑ u^ 2i
E ( )
n−2
=E ( σ^ 2 ) =σ 2

(Note that β2 and σ


2
appearing on the right sides of these equations are the true
parameters.) Therefore, if β 2 is in fact zero, both the above equations provide us with
identical estimates of true σ 2 . In this situation, the explanatory variable X has no linear
influence on Y whatsoever and the entire variation in Y is explained by the random
disturbances ui. If, on the other hand, β2 is not zero, the two equations will be
different and part of the variation in Y will be ascribable to X. Therefore, the F ratio
provides a test of the null hypothesis H 0 : β2=0 . Since all the quantities entering into
this equation can be obtained from the available sample, this F ratio provides a test
statistic to test the null hypothesis that true β2 is zero. All that needs to be done is to
compute the F ratio and compare it with the critical F value obtained from the F tables at
the chosen level of significance.

Next the ANOVA table will be prepared as follows


Source of variation Sum of squares Degrees of freedom Mean sum of squares
(SS) (df) (MSS)
2
Explained sum of squares ∑ ^y i � = K-1 ∑ ^y 2i /(K −1)
Residual sum of squares ∑ u^ 2i n-K ∑ u^ 2i /(n−K )= σ^ 2
Total sum of squares ∑ y 2i n-1 Fratio = ratio of MSS

Associated with any sum of squares is its df, the number of independent observations on
which it is based. TSS has n-1df because we lose 1 df in computing the sample mean
Ý . RSS has n−K df. (Why?) ESS has � = K-1 df. Mean sum of squares is obtained by
dividing SS by their df.
F We can generalize the F-testing procedure as follows.

55
Given the K-variable regression model:
Y i=β 1 + β 2 X 2 i + β 3 X 3 i+ …+ β K X Ki +u i
To test the hypothesis
H 0 : β2= β3 =…= β K =0
(i.e., all slope coefficients are simultaneously zero) versus
H1: Not all slope coefficients are simultaneously zero
Compute
n−K
RSS /(¿)
ESS / df ESS /( K −1)
F= =
RSS / df ¿
If F > F α (K −1 , n−K ) , reject H0; otherwise you do not reject it, where
F α (K −1 , n−K ) is the critical F value at the α level of significance and (K−1)
numerator df and (n−K) denominator df.
A summary of the F-statistic
Null hypothesis Alternative hypothesis Critical region
H0 H1 Reject H 0 if
2 2 2 2 2
σ =σ
1 2 σ1> σ2 S1
2
> F α, ndf ,ddf
S2
σ 21 =σ 22 σ 21 ≠ σ 22 S 21
> F α / 2, ndf ,ddf
S 22
2
1−α / ¿
¿
Or ¿
2
S1
2
< F¿
S2

Notes:
1. σ 21 and σ 22 are the two population variances.
2. S 21 and S 22 are the two sample variances.
3. ndf and ddf denote, respectively, the numerator and denominator df.
4. In computing the F ratio, put the larger S 2 value in the numerator.

56
5. The critical F values are given in the last column. The first subscript of F is the level of
significance and the second subscript is the numerator and denominator df.

Example 3
With reference to the production function regression in the previous example suppose
you are given with the following intermediary results

Normal equations 12 β^ 2 +8 ^β 3=10


8 ^β 2+12 ^β 3=8
Test the joint hypothesis H 0 : β2= β3 =0
An important relationship between R2 and F
There is an intimate relationship between the coefficient of determination R 2 and the F
test used in the analysis of variance. More generally, in the K-variable case, if we assume
that the disturbances are normally distributed and that the null hypothesis is
H 0 : β2= β3 =…= β K =0
then it follows that
n−K
RSS /(¿)
ESS /( K−1)
F=
¿
follows the F distribution with K−1 and n−K df. (Note: The total number of parameters to
be estimated is K, of which one is the intercept term.)
Let us manipulate the above equation as follows:
n−K ESS
F=
K −1 RSS
n−K ESS
F=
K −1 TSS−ESS
n−K ESS /TSS
F=
K −1 1−(ESS/TSS)
2
n−K R
F=
K −1 1−R 2
2
R /(K −1)
F=
(1−R2 )/(n−K )

57
Where use is made of the definition R2 = ESS/TSS. The above shaded equation shows
how F and R2 are related. These two vary directly. When R 2 = 0, F is also zero. The larger
the R2, the greater the F value. In the limit, when R 2 = 1, F is infinite. Thus the F test,
which is a measure of the overall significance of the estimated regression, is also a test of
significance of R2. In other words, testing the null hypothesis H 0 : β2= β3 =…= β K =0
is equivalent to testing the null hypothesis that (the population) R2 is zero.

One advantage of the F test expressed in terms of R 2 is its ease of computation: All that
one needs to know is the R 2 value. Therefore, the overall F test of significance can be
recast in terms of R2 as shown in the table below

Source of variation Sum of squares Degrees of freedom Mean sum of squares


(SS) (df) (MSS)
Explained sum of squares R2.TSS � =K-1 R2 .TSS /K −1
Residual sum of squares (1- R2).TSS n-K ( 1−R 2) . TSS/(n−K )
Total sum of squares TSS n-1

R 2 . TSS/ K−1 R2 / ( K −1 )
F= =
( 1−R2 ) .TSS / ( n−K ) ( 1−R 2 ) / ( n−K )

UNIT 5
Dummy Variable Regression Models
A dummy variable (an indicator variable) is a numeric variable that represents categorical
data, such as gender, race, political affiliation, etc.
Technically, dummy variables are dichotomous, quantitative variables. Their range of
values is small; they can take on only two quantitative values. As a practical matter,
regression results are easiest to interpret when dummy variables are limited to two
specific values, 1 or 0. Typically, 1 represents the presence of a qualitative attribute, and 0
represents the absence.
The number of dummy variables required to represent a particular categorical variable
depends on the number of values that the categorical variable can assume. To represent a
categorical variable that can assume k different values, a researcher would need to define
k - 1 dummy variables.

58
For example, suppose we are interested in political affiliation, a categorical variable that
might assume three values- Republican, Democrat, or Independent. We could represent
political affiliation with two dummy variables:
X1 = 1, if Republican; X1 = 0, otherwise.
X2 = 1, if Democrat; X2 = 0, otherwise.
In this example, notice that we don't have to create a dummy variable to represent the
"Independent" category of political affiliation. If X1 equals zero and X2 equals zero, we
know the voter is neither Republican nor Democrat. Therefore, voter must be
Independent.
How to Interpret Dummy Variables
Once a categorical variable has been recoded as a dummy variable, the dummy variable
can be used in regression analysis just like any other quantitative variable.
For example, suppose we wanted to assess the relationship between household income
and political affiliation (i.e., Republican, Democrat, or Independent). The regression
equation might be: Income = b0 + b1X1+ b2X2
Where b0, b1, and b2 are regression coefficients. X1 and X2 are regression coefficients
defined as:
X1 = 1, if Republican; X1 = 0, otherwise.
X2 = 1, if Democrat; X2 = 0, otherwise.
The value of the categorical variable that is not represented explicitly by a dummy
variable is called reference group. In this example, the reference group consists of
Independent voters.
In analysis, each dummy variable is compared with the reference group. In this example,
a positive regression coefficient means that income is higher for the dummy variable
political affiliation than for the reference group; a negative regression coefficient means
that income is lower. If the regression coefficient is statistically significant, the income
discrepancy with the reference group is also statistically significant.
Dummy Variable Recoding
Exampled the first thing we need to do to express gender as one or more dummy
variables. How many dummy variables will we need to fully capture all of the
information inherent in the categorical variable Gender? To answer that question, we look

59
at the number of values (k) Gender can assume. We will need k - 1 dummy variables to
represent Gender. Since Gender can assume two values (male or female), we will only
need one dummy variable to represent gender.
Therefore, we can express the categorical variable Gender as a single dummy variable
(X1), like so:
X1 = 1 for male students.
X1 = 0 for non-male students.
Note that X1 identifies male students explicitly. Non-male students are the reference
group. This was an arbitrary choice. The analysis works just as well if you use X 1 to
identify female students and make non-female students the reference group.

Look at the following example.


Illustration: The following model represents the relationship between geographical
location and teachers’ average salary in public schools. The data were taken from 50
states for a single year. The 50 states were classified into three regions: Northeast, South
and West. The regression models looks like the following.
α and β
Where Yi = the (average) salary of public school teachers in state i
D1i = 1 if the state is in the Northeast
= 0 otherwise (i.e. in other regions of the country)
D2i = 1 if the state is in the South
= 0 otherwise (i.e. in other regions of the country)
Note that the above regression model is like any multiple regression model considered
previously, except that instead of quantitative regressors, we have only qualitative
(dummy) regressors. Dummy regressors take value of 1 if the observation belongs to that
particular category and 0 otherwise.
Note also that there are 3 states (categories) for which we have created only two dummy
variables (D1 and D2). One of the rules in dummy variable regression is that if there are m

60
categories, we need only m-1 dummy variables. If we are suppressing the intercept, we
can have m dummies but the interpretation will be a bit different.
The intercept value represents the mean value of the dependent variable for the bench
mark category. This is the category for which we do not assign a dummy (in our case,
West is a bench mark category). The coefficients of the dummy variable are called
differential intercept coefficients because they tell us by how much the value of the
intercept that receives the value of 1 differs from the intercept coefficient of the
benchmark category.

CHAPTER 6

6.1. MULTICOLLINREARITY

6.1.1 THE NATURE OF MULTICOLLINEARITY


Originally, the term multicollinearity meant the existence of a “perfect,” or exact, linear
relationship among some or all explanatory variables of a regression model. For the K-
variable regression involving explanatory variable X1 , X2 , . . . , XK (where
X 1 = 1 for all observations to allow for the intercept term), an exact linear relationship
is said to exist if the following condition is satisfied:
λ1 X 1 + λ2 X 2 +…+ λ K X K =0 ....................................................... 6.1.1
Where λ1 , λ2 , . . . , λK are constants such that not all of them are zero
simultaneously.

Today, however, the term multicollinearity is used in a broader sense to include the case
of perfect multicollinearity, as shown by (6.1.1), as well as the case where the X variables
are intercorrelated but not perfectly so, as follows:
λ1 X 1 + λ2 X 2 +…+ λ K X K + v i=0 …………………..………… 6.1.2
where v i is a stochastic error term.

61
To see the difference between perfect and less than perfect multicollinearity, assume, for
example, that
λ2 ≠ 0. Then, (6.1.1) can be written as
−λ1 λ λ
X 2 i= X 1 i− 3 X 3 i −…− K X Ki …………………………… 6.1.3
λ2 λ2 λ2

which shows how X2 is exactly linearly related to other variables or how it can be
derived from a linear combination of other X variables. In this situation, the coefficient of
correlation between the variable X2 and the linear combination on the right side of
(6.1.3) is bound to be unity.

Similarly, if λ2 ≠ 0, Eq. (6.1.2) can be written as


−λ1 λ3 λK 1
X 2 i= X 1 i− X 3 i −…− X Ki− v i ……………………6.1.4
λ2 λ2 λ2 λ2
which shows that X2 is not an exact linear combination of other X’s because it is also
determined by the stochastic error term v i .

Why does the classical linear regression model assume that there is no multicollinearity
among the X’s? The reasoning is this: If multicollinearity is perfect in the sense of
(6.1.1), the regression coefficients of the X variables are indeterminate and their
standard errors are infinite. If multicollinearity is less than perfect, as in (6.1.2), the
regression coefficients, although determinate, possess large standard errors (in
relation to the coefficients themselves), which means the coefficients cannot be
estimated with great precision or accuracy. The proofs of these statements are given as
follows.

Estimation in the presence of perfect multicollinearity


The fact that in the case of perfect multicollinearity the regression coefficients remain
indeterminate and their standard errors are infinite can be demonstrated readily in terms
of the three-variable regression model. Using the deviation form, where all the variables

62
are expressed as deviations from their sample means, we can write the three-variable
regression model as
y i= ^β 2 x 2 i + ^β 3 x 3 i+ u^ i ………………………………………… 6.1.5

Now from Chapter 5 we obtain


x2 i y i
∑¿
¿
2
(∑ x 3 i )−(∑ x 3 i y i )(∑ x 2 i x 3 i)
¿
^β 2=¿

x3i yi
∑¿
¿
2
(∑ x 2 i )−(∑ x 2 i y i )(∑ x 2 i x 3 i)
¿
^β =¿
3

Assume that X3i = λ X2i , where λ is a nonzero constant. Substituting this into the

formula for ^β 2 , we obtain

yi x2i
∑¿
¿ 0
¿ ………………………….…. 6.1.6
(∑ λ x 2 i )−( λ ∑ y i x 2i )( λ ∑ x 22 i)
2 2
0
¿
^β 2=¿

Which is an indeterminate expression. Verify that ^β 3 is also indeterminate?

Why do we obtain the result shown in (6.1.6)? Recall the meaning of ^β 2 : It gives the

rate of change in the average value of Y as X2 changes by a unit, holding X3 constant.


But if X3 and X2 are perfectly collinear, there is no way X3 can be kept constant: As X2
changes, so does X3 by the factor λ. This means, there is no way of disentangling the
separate influences of X2 and X3 from the given sample: For practical purposes X2 and
X3 are indistinguishable.

63
Estimation in the presence of “high” but “imperfect” multicollinearity
Generally, there is no exact linear relationship among the X variables, especially in data
involving economic time series. Thus, turning to the three-variable model in the deviation
form given in (6.1.5), instead of exact multicollinearity, we may have
x 3i =λ x 2i + v i ……………………………………………………. 6.1.7

where λ ≠ 0 and where vi is a stochastic error term such that ∑ x 2 i v i=0 . (Why?)

In this case, estimation of regression coefficients β2 and β3 may be possible. For


example, substituting (6.1.7) into the formula for ^β 2 , we obtain

y
∑ (¿ 2 i )( λ ∑ x2 i +∑ v i )−(λ ∑ y i x 2 i +∑ y i v i )( λ ∑ x 22i )
¿ i x 2 2 2

2 ………………….6.1.8
∑ x 22 i ( λ2 ∑ x 22i + ∑ v 2i )−(λ ∑ x 22 i)
β^ 2=¿
where use is made of ∑ x 2 i v i=0 . A similar expression can be derived for ^β 3 .

Now, unlike (6.1.6), there is no reason to believe a priori that (6.1.8) cannot be estimated.
Of course, if vi is sufficiently small, say, very close to zero, (6.1.8) will indicate almost
perfect collinearity and we shall be back to the indeterminate case of (6.1.6).

6.1.2 SOURCES OF MULTICOLLINEARITY


Multicollinearity may be due to the following factors:
1. The data collection method employed, for example, sampling over a limited range
of the values taken by the regressors in the population.
2. An overdetermined model. This happens when the model has more explanatory
variables than the number of observations.
3. Inherent nature of the data. Especially in time series data, where the regressors
included in the model share a common trend, that is, they all increase or decrease
over time. For example, in the regression of consumption expenditure on income,

64
wealth, and population, the regressors income, wealth, and population may all be
growing over time at more or less the same rate, leading to collinearity among
these variables.

6.1.3 CONSEQUENCES OF MULTICOLLINEARITY


In cases of near or high multicollinearity, one is likely to encounter the following
consequences:
1. Although BLUE, the OLS estimators have large variances, making precise
estimation difficult.
2. Because of consequence 1, the confidence intervals tend to be much wider,
leading to the acceptance of the “zero null hypothesis” (i.e., the true population
coefficient is zero) more readily.
3. Also because of consequence 1, the t ratio of one or more coefficients tends to be
statistically insignificant.
4. Although the t ratio of one or more coefficients is statistically insignificant, R 2,
the overall measure of goodness of fit, can be very high.
5. The OLS estimators and their standard errors can be sensitive to small changes in
the data.

The preceding consequences can be demonstrated as follows.


Large Variances of OLS Estimators
To see large variances, it is necessary and can be shown for the model (6.1.5) the
variances of ^β 2 and ^β 3 are given by

σ2
var ( ^β2 ) = …………………………….. 6.1.3.1
∑ x22 i (1−r 223)

65
2
σ
var ( ^β3 ) = …………………………….. 6.1.3.2
∑ x3 i (1−r 223)
2

where r 23 is the coefficient of correlation between X2 and X3.

It is apparent from (6.1.3.1) and (6.1.3.2) that as r 23 tends toward 1, that is, as
collinearity increases, the variances of the two estimators increase and in the limit when
r 23 = 1, they are infinite.

The speed with which variances increase can be seen with the variance-inflating factor
(VIF), which is defined as
1
VIF= ………………………………..…….. 6.1.3 3
(1−r 223 )
VIF shows how the variance of an estimator is inflated by the presence of
multicollinearity. As r 223 approaches 1, the VIF approaches infinity. That is, as the
extent of collinearity increases, the variance of an estimator increases, and in the limit it
can become infinite. As can be readily seen, if there is no collinearity between X2 and
X3, VIF will be 1.
Using this definition, we can express (6.1.3.1) and (6.1.3.2) as
σ2
var ( ^β2 ) = VIF
∑ x22 i
σ2
var ( ^β3 ) = VIF
∑ x23 i
Which show that the variances of ^β 2 and ^β 3 are directly proportional to the VIF.

The results just discussed can be easily extended to the K-variable model. In such a
model, the variance of the Kth coefficient can be expressed as:
σ2
var ( ^β j )= ……………………………… 6.1.3.4
∑ x 2j (1−R 2j )
Where ^β j = (estimated) partial regression coefficient of regressor Xj
2
Rj = R2 in the regression of Xj on the remaining (K−2) regressions
[Note: There are (K−1) regressors in the K-variable regression model.]

66
X
(¿¿ j− X́ j )2
∑ x 2j =∑ ¿
We can also write (6.3.4) as
σ2
var ( ^β j )= VIF j ……………….………… (6.1.3.5)
∑ x 2j
As you can see from this expression, var ( ^β j ) is proportional to σ2 and VIF but

inversely proportional to ∑ x 2j . The last one states that the larger the variability in a
repressor, the smaller the variance of the coefficient of that repressor, assuming the other
two ingredients are constant, and therefore the greater the precision with which that
coefficient can be estimated.

Wider Confidence Intervals


Because of the large standard errors, the confidence intervals for the relevant population
parameters tend to be larger.

“Insignificant” t Ratios
Recall that to test the null hypothesis that, say, β2 = 0, we use the t ratio, that is,

se (¿ β^ 2 )
^β 2 /¿ , and compare the estimated t value with the critical t value from the t table.

But as we have seen, in cases of high collinearity the estimated standard errors increase
dramatically, thereby making the t values smaller. Therefore, in such cases, one will
increasingly accept the null hypothesis that the relevant true population value is zero.

6.1.4 DETECTION OF MULTICOLLINEARITY


Multicollinearity is essentially a sample phenomenon, arising out of the largely non-
experimental data collected in most social sciences. Multicollinearity is also a question of

67
degree and not of kind. Some rules of thumb for detecting it or measuring its strength are
as follows.

1. High R2 but few significant t ratios. If R2 is high, say, in excess of 0.8, the F test
in most cases will reject the hypothesis that the partial slope coefficients are
simultaneously equal to zero, but the individual t tests will show that none or very
few of the partial slope coefficients are statistically different from zero.
2. High pair-wise correlations among regressors. Another suggested rule of
thumb is that if the pair-wise or zero-order correlation coefficient between two
regressors is high, say, in excess of 0.8, then multicollinearity is a serious
problem. High zero-order correlations are a sufficient but not a necessary
condition for the existence of multicollinearity because it can exist even though
the zero-order or simple correlations are comparatively low (say, less than 0.50).
3. High variance inflation factor. The larger the value of VIF j , the more
“troublesome” or collinear the variable Xj. As a rule of thumb, if the VIF of a
2
variable exceeds 10, which will happen if Rj exceeds 0.90, that variable is said
be highly collinear.

6.1.5 REMEDIAL MEASURES


What can be done if multicollinearity is serious? We have two choices:
(1) Do nothing or (2) follow some rules of thumb.
Rule-of-Thumb Procedures
1. Combining cross-sectional and time series data (pooling the data)
2. Dropping a variable(s). When faced with severe multicollinearity, one of the
“simplest” things to do is to drop one of the collinear variables. But in dropping a
variable from the model we may be committing a specification bias or
specification error.
3. Transformation of variables. Suppose we have time series data on consumption
expenditure, income, and wealth. One reason for high multicollinearity between
income and wealth in such data is that over time both the variables tend to move

68
in the same direction. One way of minimizing this dependence is to proceed as
follows.
If the relation
Y t =β1 + β 2 X 2 t + β 3 X 3 t +ut ………………………………… 6.1.5 1
holds at time t, it must also hold at time t − 1 because the origin of time is arbitrary
anyway. Therefore, we have
Y t−1= β1 + β 2 X 2 ,t −1 + β 3 X 3 ,t −1+u t−1 ……………...……… 6.1.5.2

If we subtract (6.5.2) from (6.5.1), we obtain


Y t −Y t−1=β 2 ( X 2 t −X 2 ,t −1 ) + β 3 ( X 3 t− X 3 ,t −1 ) + v t ……..………… 6.1.5.3
Where v t =ut −ut −1 . Equation (6.5.3) is known as the first difference form because
we run the regression, not on the original variables, but on the differences of successive
values of the variables.

The first difference regression model often reduces the severity of multicollinearity
because, although the levels of X2 and X3 may be highly correlated, there is no a priori
reason to believe that their differences will also be highly correlated.

Another commonly used transformation in practice is the ratio transformation.


Consider the model:
Y t =β1 + β 2 X 2 t + β 3 X 3 t +ut ………………………………… 6.1.5.4

where Y is consumption expenditure in real dollars, X2 is GDP, and X3 is total


population. Since GDP and population grow over time, they are likely to be correlated.
One “solution” to this problem is to express the model on a per capita basis, that is, by
dividing (6.1.5.4) by X3, to obtain:
Yt 1 X u
X3t
=β 1
X3t( ) ( )
+ β 2 2 t + β3 +( t ) ……………………...…
X3t X3t
6.1.5 5

Such a transformation may reduce collinearity in the original variables.

69
4. Additional or new data. Since multicollinearity is a sample feature, it is possible
that in another sample involving the same variables collinearity may not be as
serious as in the first sample. Sometimes simply increasing the size of the sample
may attenuate the collinearity problem. For example, in the three-variable model
we saw that
σ2
var ( ^β2 ) =
∑ x22 i (1−r 223)
Now as the sample size increases, ∑ x 22 i will generally increase. (Why?)

Therefore, for any given r 23 , the variance of ^β 2 will decrease, thus


decreasing the standard error, which will enable us to estimate β2 more precisely.

6.2. HETEROSCEDASTICITY

6.2.1. THE NATURE OF HETEROSCEDASTICITY


As noted in Chapter 3, one of the important assumptions of the classical linear regression
model is that the variance of each disturbance term ui , conditional on the chosen
values of the explanatory variables, is some constant number equal to σ 2 . This is the
assumption of homoscedasticity, or equal (homo) spread (scedasticity), that is, equal
variance. Symbolically,

E ( u2i ) =σ 2 i=1, 2, … ,n ……………………………..… 6.2.1.1

Diagrammatically, in the two-variable regression model homoscedasticity can be shown


as in Figure 6.1
Density

Saving Y

β 1+ β 2 X i
Income
X

70
Fig 6.1 Homoscedastic disturbances

As Figure 6.1shows, the conditional variance of Y i (which is equal to that of ui ),


conditional upon the given X i , remains the same regardless of the values taken by the
variable X.

In contrast, consider Figure 6.2 below, which shows that the conditional variance of
Y i increases as X increases. Here, the variances of Y i are not the same. Hence,
there is heteroscedasticity. Symbolically,

E ( u2i ) =σ 2i …………………………………….………… 6.2.1.2

Notice the subscript of σ 2 , which reminds us that the conditional variances of ui (=


conditional variances of Y i ) are no longer constant.
Density

Saving Y

β 1+ β 2 X i
Income
X
Fig 6.2 Heteroscedastic disturbances

To make the difference between homoscedasticity and heteroscedasticity clear, assume


that in the two-variable model Y i=β 1 + β 2 X i +ui , Y represents savings and X represents
income. Figures 6.1 and 6.2 show that as income increases, savings on the average also
increase. But in Figure 6.1 the variance of savings remains the same at all levels of
income, whereas in Figure 6.2 it increases with income. It seems that in Figure 6.2 the
higher income families on the average save more than the lower-income families, but
there is also more variability in their savings.

6.2.2 SOURCES OF HETEROSCEDASTICITY


1. As people learn, their errors of behavior become smaller over time. In this case,
2
σ i is expected to decrease. E.g. Typing errors Vs Hours of typing practice

2. As incomes grow, people have more discretionary income and hence more scope
for choice about the disposition of their income. Hence, σ 2i is likely to increase
with income. Thus in the regression of savings on income one is likely to find

71
2
σ i increasing with income (as in Figure 6.2.) because people have more
choices about their savings behavior.

3. As data collecting techniques improve, σ 2i is likely to decrease. Thus, banks


that have sophisticated data processing equipment are likely to commit fewer
errors in the monthly or quarterly statements of their customers than banks
without such facilities.

4. Heteroscedasticity can also arise as a result of the presence of outliers. An


outlying observation, or outlier, is an observation that is much different (either
very small or very large) in relation to the observations in the sample. More
precisely, an outlier is an observation from a different population to that
generating the remaining sample observations. The inclusion or exclusion of such
an observation, especially if the sample size is small, can substantially alter the
results of regression analysis.

5. Heteroscedasticity may be due to omission of some important variables from the


model. For example, in the demand function for a commodity, if we do not
include the prices of commodities complementary to or competing with the
commodity in question, the residuals obtained from the regression may give the
distinct impression that the error variance may not be constant.

6. Another source of heteroscedasticity is skewness in the distribution of one or


more regressors included in the model. Examples are economic variables such as
income, wealth, and education. It is well known that the distribution of income
and wealth in most societies is uneven, with the bulk of the income and wealth
being owned by a few at the top.

Note that the problem of heteroscedasticity is likely to be more common in cross-


sectional than in time series data. In cross-sectional data, one usually deals with members
of a population at a given point in time, such as individual consumers or their families,
firms, industries, or geographical subdivisions such as state, country, city, etc. Moreover,
these members may be of different sizes, such as small, medium, or large firms or low,
medium, or high income. In time series data, on the other hand, the variables tend to be of
similar orders of magnitude because one generally collects the data for the same entity
over a period of time. Examples are GNP, consumption expenditure, savings, or
employment over some period of time.

72
6.2.3. OLS ESTIMATION IN THE PRESENCE OF
HETEROSCEDASTICITY
In the presence of heteroscedasticity, ^β 2 is still linear unbiased and consistent
estimator? But, ^β 2 is no longer best (i.e. have no minimum variance). Then what is
BLUE in the presence of heteroscedasticity?
The answer is given in the following discussion.

The method of generalized least squares (GLS)


Why is the usual OLS estimator of β 2 not best, although it is still unbiased?
Unfortunately, the usual OLS method does not make use of the “information” contained
in the unequal variability of the dependent variable Y. It assigns equal weight or
importance to each observation. But a method of estimation, known as generalized least
squares (GLS), takes such information into account explicitly and is therefore capable of
producing estimators that are BLUE. To see how this is accomplished, let us continue
with the now-familiar two-variable model:
Y i=β 1 + β 2 X 2 i +ui …………………………….….. 6.2.3.1

Which for ease of algebraic manipulation we write as


Y i=β 1 X 1i + β 2 X 2 i+ ui ………………………….… 6.2.3.2
Where X 1 i = 1 for each i. One can see that these two formulations are identical.

Now assume that the heteroscedastic variances σ 2i are known. Divide (6.2.3.2)
through by σ i to obtain
Yi X X u
σi ( ) ( )( )
=β 1 1 i + β 2 2 i + i ……………...……… 6.2.3.3
σi σi σi

Which for ease of exposition we write as


Y ¿i =β1¿ X ¿1 i + β ¿2 X ¿2i +u¿i .......……………………… 6.2.3.4

Where the starred, or transformed, variables are the original variables divided by (the
known) σ i .We use the notation β 1 and β 2 , the parameters of the transformed
¿ ¿

model, to distinguish them from the usual OLS parameters β 1 and β 2 .

What is the purpose of transforming the original model? To see this, notice the following
¿
feature of the transformed error term ui :
ui 2
¿
()
¿ 2
var ( u i ) =E(ui ) =E
σi
1
¿ 2 E(u 2i ) Since σ 2i is known
σi
1
¿ 2 ( σ 2i ) Since E ( u2i ) =σ 2i
σi
¿1

73
¿
Which is a constant. That is, the variance of the transformed disturbance term ui is
now homoscedastic. Since we are still retaining the other assumptions of the classical
model, the finding that it is u¿ that is homoscedastic suggests that if we apply OLS to
the transformed model (6.2.3.3) it will produce estimators that are BLUE. In short, the
estimated β 1 and β 2 are now BLUE and not the OLS estimators ^β 1 and ^β 2 .
¿ ¿

This procedure of transforming the original variables in such a way that the transformed
variables satisfy the standard least-squares assumptions and then applying OLS to them is
known as the method of generalized least squares (GLS). The estimators thus obtained
are known as GLS estimators, and it is these estimators that are BLUE.

¿ ¿
The actual mechanics of estimating β 1 and β 2 are as follows. First, we write down
the SRF of (6.2.3.3)
Y i ^¿ X1i ^ ¿ X2i u^
σi
=β1( ) ( )( )
σi
+ β2
σi
+ i
σi
or
Y ¿i = ^β1¿ X ¿1 i + ^β ¿2 X ¿2i + u^ ¿i ………………………………..... 6.2.3.5

Now, to obtain the GLS estimators, we minimize


¿

∑ u^ 2i =∑ (Y ¿i − ^β ¿1 X ¿1 i− β^ ¿2 X ¿2 i)2
that is,
2
u^ i 2 Y i ^¿ X1i ^¿ X2i
() i i
[( ) ( ) ( ) ]
∑ σ = ∑ σ − β1 σ − β2 σ
i i
.……………... 6.2.3.6

∑ wi u^ 2i =∑ w i (Y i− ^β ¿1− ^β ¿2 X 2 i)2 , where w i=1/σ 2i

The actual mechanics of minimizing (6.2.3.6) follow the partial derivative techniques.
¿ ¿
Using this techniques, the GLS estimator of β 1 and β 2 is given as follows

74
wi
∑¿
¿
wi X2i Y i
wi X 2 i
wiY i
∑ ¿ …………………...... 6.2.3.7
¿
¿
¿∑ ¿¿
∑ ¿−¿
¿
^β¿2=¿

^β ¿1=Ý ¿ − ^β ¿2 X́ ¿2 where Ý ¿ =
∑ wi Y i and X́ ¿2=
∑ wi X 2 i ... 6.2.3.8
∑ wi ∑ wi
Thus, in GLS we minimize a weighted sum of residual squares with w i=1/σ 2i acting
as the weights, but in OLS we minimize an unweighted or (what amounts to the same
thing) equally weighted RSS. As (6.2.3.6) shows, in GLS the weight assigned to each
observation is inversely proportional to its σi , that is, observations coming from a
population with larger σi will get relatively smaller weight and those from a population
with smaller σi will get proportionately larger weight in minimizing the RSS (6.2.3.6).

Since (6.2.3.6) minimizes a weighted RSS, it is appropriately known as weighted least


squares (WLS), and the estimators thus obtained and given in (7.3.7) and (7.3.8) are
known as WLS estimators. But WLS is just a special case of the more general
estimating technique, GLS. Note that if w i=w , a constant for all i, ^β ¿2 is identical
with ^β 2 .

6.2.4 CONSEQUENCES OF HETEROSCEDASTICITY


I. The least square estimators become inefficient. I.e. no longer with minimum
variance property although they are still linear and unbiased.
2 2
^ ∑ x2 i σ i
var ( β2 ) = 2 , when heteroscedasticity is taken in to account
(∑ x 22 i )
II. The estimates of the variances are biased, thus invalidating tests of significance.
III. The prediction of the Y for a given value of X would be inefficient (since they are
based on the ^β ’s which have high variance.

75
6.2.5 DETECTION OF HETEROSCEDASTICITY
More often than not, in economic studies there is only one sample Y value corresponding
to a particular value of X. And there is no way one can know σ 2i from just one Y
observation. Therefore, in most cases involving econometric investigations,
heteroscedasticity may be identified based on the examination of
the OLS residuals u^ i since they are the ones we observe, and not the disturbances
ui . One hopes that they are good estimates of ui , a hope that may be fulfilled if the
sample size is fairly large.

Informal Methods
1) Nature of the Problem Very often the nature of the problem under consideration
suggests whether heteroscedasticity is likely to be encountered. For example,
 The residual variance around the regression of consumption on income
increased with income.
 As a matter of fact, in cross-sectional data involving heterogeneous units,
heteroscedasticity may be the rule rather than the exception.

2) Graphical Method If there is no a priori or empirical information about the


nature of heteroscedasticity, in practice one can do the regression analysis on the
assumption that there is no heteroscedasticity and then examination of the residual
squared u^ 2i to see if they exhibit any systematic pattern. Although u^ 2i are not
the same thing as u2i they can be used as proxies especially if the sample size is
sufficiently large. An examination of the u^ 2i may reveal patterns such as those
shown in Figure 6.3.

u^ 2i u^ 2i u^ 2i

Y^ Y^ Y^
(a) (b) (c)
2
2
u^ i u^ i

76
Y^ Y^
(d) (e)
Fig 6.3 Hypothetical patterns of estimated squared residuals

In Figure 6.3, u^ 2i are plotted against Y^ i , the estimated Y i from the regression
line, the idea being to find out whether the estimated mean value of Y is systematically
related to the squared residual. In Figure 6.3a it can be seen that there is no systematic
pattern between the two variables, suggesting that perhaps no heteroscedasticity is
present in the data. Figure 6.3 b to e, however, exhibits definite patterns. For instance,
Figure 6.3c suggests a linear relationship, whereas Figure 6.3d and e indicates a quadratic
relationship between u^ 2i and Y^ i . Using such knowledge, one may transform the
data in such a manner that the transformed data do not exhibit heteroscedasticity. Instead
of plotting u^ 2i against Y^ i , one may plot them against one of the explanatory
variables, especially if plotting u^ 2i against Y^ i results in the pattern shown in Figure
6.3a. This is useful for cross check.

6.2.6 REMEDIAL MEASURES


As we have seen, heteroscedasticity does not destroy the unbiasedness and consistency
properties of the OLS estimators, but they are no longer efficient, not even asymptotically
(i.e., large sample size). This lack of efficiency makes the usual hypothesis-testing
procedure of dubious value. Therefore, remedial measures may be called for. There are
two approaches to remediation: when σ 2i is known and when σ 2i is not known.

When σ 2i Is Known: The Method of Weighted Least Squares


If σ 2i is known, the most straightforward method of correcting heteroscedasticity is by
means of weighted least squares, for the estimators thus obtained are BLUE.

When σ 2i Is Not Known


Since the true σ 2i are rarely known, there is another way of obtaining consistent
estimates of the variances of OLS estimators even if there is heteroscedasticity. This is by
doing some plausible assumptions about heteroscedasticity pattern. To illustrate this, let
us revert to the two-variable regression model:
Y i=β 1 + β 2 X i +ui

We now consider several assumptions about the pattern of heteroscedasticity.


Assumption 1: The error variance is proportional to X 2i :
E ( u2i ) =σ 2 X 2i ……………………….….. 6.2.6.1

77
If it is believed that the variance of ui is proportional to the square of the explanatory
variable X, one may transform the original model as follows. Divide the original model
through by X i :
Y i β1 u
= +β 2+ i
Xi Xi Xi
1
¿ β 1 + β 2 + vi ……. 6.2.6.2
Xi

ui
where v i is the transformed disturbance term, equal to . Now it is easy to verify
Xi
that
ui 2 1
2
E ( v ) =E
i ( )
Xi
= 2 E ( u2i )
Xi
2
¿σ Using (6.2.6.1)

Hence the variance of v i is now homoscedastic, and one may proceed to apply OLS to
Yi 1
the transformed equation (6.2.6.2), regressing on . Notice that in the
Xi Xi
transformed regression the intercept term β 2 is the
slope coefficient in the original equation and the slope coefficient β 1 is the intercept
term in the original model. Therefore, to get back to the original model we shall have to
multiply the estimated (6.2.6.2) by X i .
Assumption 2: The error variance is proportional to X i . The square root
transformation:
E ( u2i ) =σ 2 X i …………………………............… 6.2.6.3
If it is believed that the variance of ui , instead of being proportional to the squared
X i , is proportional to X i itself, then the original model can be transformed as
follows:
Yi β1 ui
= + β2 √ X i +
√X i √ Xi √Xi
1
¿ β1 + β 2 √ X i+ v i …...... 6.2.6.4
√ Xi
ui
Where v i= and where X i > 0
√ Xi
Given assumption 2, one can readily verify that E ( v 2i ) =σ 2 , a homoscedastic situation.
Yi 1
Therefore, one may proceed to apply OLS to (6.6.4), regressing on and
√X i √X i
√ X i . Note an important feature of the transformed model: It has no intercept term.
Therefore, one will have to use the regression-through-the-origin model to estimate β 1

78
and β 2 . Having run (6.6.4), one can get back to the original model simply by
multiplying (6.6.4) by √ X i .

Assumption 3: The error variance is proportional to the square of the mean value of
Y.
2
E ( u2i ) =σ 2 [ E(Y i ) ] ………………..….……..… 6.2.6.5
Equation (6.6.5) postulates that the variance of ui is proportional to the square of the
expected value of Y. Now E(Y i )=β 1 + β 2 X i

Therefore, if we transform the original equation as follows,


Yi β Xi u
= 1 + β2 + i
E (Y i) E(Y i ) E (Y i) E (Y i)
1 Xi
¿ β1 + β2 + v …......... 6.2.6.6
E(Y i ) E(Y i ) i
ui
where v i= , it can be seen that E ( v 2i ) =σ 2 ; that is, the disturbances v i are
E(Y i)
homoscedastic. Hence, it is regression (6.2.6.6) that will satisfy the homoscedasticity
assumption of the classical linear regression model.

The transformation (6.2.6.6) is, however, inoperational because E ( Y i ) depends on β1


and β2, which are unknown. Of course, we know Y^ i= β^ 1 + ^β 2 X i , which is an estimator
of E(Y i ) . Therefore, we may proceed in two steps: First, we run the usual OLS
regression, disregarding the heteroscedasticity problem, and obtain Y^ i . Then, using the
estimated Y^ i , we transform our model as follows:
Yi 1 X
Y^ i ( ) ( )
=β 1
Y^ i
+ β 2 i +v i …………….……..… 6.2.6.7
Y^ i
ui
Where v i= ^ . In Step 2, we run the regression (6.2.6.7.). Although Y^ i are not
Yi
exactly E(Y i ) , they are consistent estimators; that is, as the sample size increases
indefinitely, they converge to true E(Y i ) . Hence, the transformation (6.2.6.7) will
perform satisfactorily in practice if the sample size is reasonably large.

Assumption 4: A log transformation such as


ln Y i=β 1 + β 2 ln X i+u i ………………………... 6.2.6.8
very often reduces heteroscedasticity when compared with the regression

6.3. AUTOCORRELATION
Another assumption of the regression model was the non-existence of serial correlation
(autocorrelation) between the disturbance terms, Ui.

79
w i≠k i w i=k i +c i
Serial correlation implies that the error term from one time period depends in some
systematic way on error terms from other time periods. Autocorrelation is more a
problem of time series data than cross-sectional data. If by chance, such a correlation is
observed in cross-sectional units, it is called spatial autocorrelation. So, it is important to
understand serial correlation and its consequences of the OLS estimators.
Nature of Autocorrelation
The classical model assumes that the disturbance term relating to any observation is not
influenced by the disturbance term relating to any other disturbance term.

β∗¿ Σw i (α+ βX i +ui ) , i  j

But if there is any interdependence between the disturbance terms then we have
autocorrelation
Y i=α+βX i +U i ,ij
Causes of Autocorrelation
Serial correlation may occur because of a number of reasons.
 Inertia (built in momentum) – a salient feature of most economic variables time
series (such as GDP, GNP, price indices, production, employment etc) is inertia or
sluggishness. Such variables exhibit (business) cycles.
 Specification bias – exclusion of important variables or incorrect functional forms
 Lags – in a time series regression, value of a variable for a certain period depends
on the variable’s previous period value.
 Manipulation of data – if the raw data is manipulated (extrapolated or
interpolated), autocorrelation might result.
Autocorrelation can be negative as well as positive. The most common kind of serial
correlation is the first order serial correlation. This is the case in which this period error
terms are functions of the previous time period error term.
=αΣwi +βΣw i X i +Σw i ui
This is also called the first order autoregressive model.
-1 < P < 1
The disturbance term Ut satisfies all the basic assumptions of the classical linear model.
∴Ε(β∗)=αΣwi +βΣwi X i
Ε ( ui )=0 β∗¿
¿
β∗¿
¿

Consequences of serial correlation

80
When the disturbance term exhibits serial correlation, the values as well as the standard
errors of the parameters are affected.
1) The estimates of the parameters remain unbiased even in the presence of
autocorrelation but the X’s and the u’s must be uncorrelated.
2) Serial correlation increases the variance of the OLS estimators. The minimum
variance property of the OLS parameter estimates is violated. That means the
OLS are no longer efficient.

β
Figure 1: The distribution of with and without serial correlation.

3) Due to serial correlation the variance of the disturbance term, U i may be


underestimated. This problem is particularly pronounced when there is positive
autocorrelation.
4) If the Uis are autocorrelated, then prediction based on the ordinary least squares
estimates will be inefficient. This is because of larger variance of the parameters.
Since the variances of the OLS estimators are not minimal as compared with other
estimators, the standard error of the forecast from the OLS will not have the least
value.

Detecting Autocorrelation
Some rough idea about the existence of autocorrelation may be gained by plotting the
residuals either against their own lagged values or against time.

81
Σwi =0 Σwi X=1
Figure 2: Graphical detection of autocorrelation

There are more accurate tests for the incidence of autocorrelation. The most common test
of autocorrelation is the Durbin-Watson Test.
The Durbin-Watson d Test
The test for serial correlation that is most widely used is the Durbin-Watson d test. This
test is appropriate only for the first order autoregressive scheme.
w i=k i +c i then Σwi =Σ(k i +c i )=Σk i +Σci
The test may be outlined as
Σc i= 0

Σki =Σwi =0

This test is, however, applicable where the underlying assumptions are met:
 The regression model includes an intercept term
 The serial correlation is first order in nature
 The regression does not include the lagged dependent variable as an explanatory
variable
 There are no missing observations in the data

The equation for the Durban-Watson d statistic is

Σwi Xi=Σ(ki+ci)Xi=Σki Xi+Σci Xi


Which is simply the ratio of the sum of squared differences in successive residuals to the
RSS

82
Note that the numerator has one fewer observation than the denominator, because an
Σw X=1 and Σk X=1 ⇒Σc X=0
observation must be used to calculate i i i i i i . A great advantage of the d-statistic is that it
is based on the estimated residuals. Thus it is often reported together with R², t, etc.
The d-statistic equals zero if there is extreme positive serial correlation, two if there is no
serial correlation, and four if there is extreme negative correlation.
1. Extreme positive serial correlation: d  0

Σci xi=0,where xi=Xi−X̄


so i i i i ∑
Σc x = c (X − X̄)=Σci X i+ X̄Σc i and Σci xi=1 Σci=0 ⇒ Σci xi=0 .
2. Extreme negative correlation: d  4

Σwi=0, Σwixi=1, Σci=0,Σci Xi=0 and


^
β

thus
var( β∗) and var ( β^ )
3. No serial correlation: d  2

var ( β∗)=var ( Σwi Y i )


=Σw 2 var(Y i ) 2 2
∴var(β∗)=σ Σwi
Since i , because they are uncorrelated. Since and
2
Var(Y i )=σ differ in only one observation, they are approximately equal.
The exact sampling or probability distribution of the d-statistic is not known and,
therefore, unlike the t, X² or F-tests there are no unique critical values which will lead to
the acceptance or rejection of the null hypothesis.

But Durbin and Watson have successfully derived the upper and lower bound so that if
the computed value d lies outside these critical values, a decision can be made regarding
the presence of a positive or negative serial autocorrelation.
Thus

2 2 2
Σw 2=Σ(ki+ci ) =Σk i +2Σki ci+Σci
i

83
Σc i x i
Σki c i = =0
2
⇒ Σw i =Σk i +Σc i
2
since
2
Σx2i
But, since –1  P  1 the above identity can be written as: 0  d  4
Therefore, the bounds of d must lie within these limits.

2 2 2 2222
var(β∗)=σ (Σki +Σci )⇒σ Σki +σ Σci
Thus if  d = 2, no serial autocorrelation.
^ 2 Σc 2
var(β∗)=var( β)+σ
if i  d = 0, evidence of positive autocorrelation.
2
if σ Σc2i  d = 4, evidence of negative autocorrelation.
Decision Rules for Durbin-Watson - d-test
Null hypothesis Decision If

No positive autocorrelation Reject 0 < d < dL


No positive autocorrelation No decision dL  d  dU
No negative autocorrelation Reject
4-dL < d < 4
No negative autocorrelation No decision
4-dU  d  4-dL
No autocorrelation Do not reject
dU < d < 4-dU

Note: Other tests for autocorrelation include the Runs test and the Breusch-Godfrey (BG)
test. There are so many tests of autocorrelation since there is no particular test that has
been judged to be unequivocally best or more powerful in the statistical sense.

84
Chapter-7 Introduction to time series analysis

7.1 Definition of Time Series

A time series is a sequential set of data points, measured typically over successive times.
A time series containing records of a single variable is termed as univariate. But if
records of more than one variable are considered, it is termed as multivariate. A time
series can be continuous or discrete. In a continuous time series observations are
measured at every instance of time, whereas a discrete time series contains observations
measured at discrete points of time. For example temperature readings, flow of a river,
concentration of a chemical process etc. can be recorded as a continuous time series. On
the other hand population of a particular city, production of a company, exchange rates
between two different currencies may represent discrete time series. Usually in a discrete
time series the consecutive observations are recorded at equally spaced time intervals
such as hourly, daily, weekly, monthly or yearly time separations.

7.2 Components of a Time Series


The factors that are responsible to bring about changes in a time series, also called the
components of time series. A time series in general is supposed to be affected by four
main components, which can be separated from the observed data. These components
are: Trend, Cyclical, Seasonal and Irregular components.

The general tendency of a time series to increase, decrease or stagnate over a long period
of time is termed as Secular Trend or simply Trend. Thus, it can be said that trend is a
long term movement in a time series. For example, series relating to population growth,
number of houses in a city etc. show upward trend, whereas downward trend can be
observed in series relating to mortality rates, epidemics, etc.

Seasonal variations in a time series are fluctuations within a year during the season. The
important factors causing seasonal variations are: climate and weather conditions,
customs, traditional habits, etc. For example sales of ice-cream increase in summer, sales

85
of woolen cloths increase in winter. Seasonal variation is an important factor for
businessmen, shopkeeper and producers for making proper future plans.

The cyclical variation in a time series describes the medium-term changes in the series,
caused by circumstances, which repeat in cycles. The duration of a cycle extends over
longer period of time, usually two or more years. Most of the economic and financial
time series show some kind of cyclical variation. For example a business cycle consists
of four phases, viz. Prosperity, ii) Decline, iii) Depression and iv) Recovery

Irregular or random variations in a time series are caused by unpredictable influences,


which are not regular and also do not repeat in a particular pattern. These variations are
caused by incidences such as war, strike, earthquake, flood, revolution, etc. There is no
defined statistical technique for measuring random fluctuations in a time series.
Considering the effects of these four components, two different types of models are
generally used for a time series viz. Multiplicative and Additive models.

7.3 Models for Estimating Time Series Data


Additive models: are the models that we have considered in earlier sections have been
additive models, and there has been an implicit assumption that the different components
affected the time series additively.

Y (t) = T (t) + S (t) + C (t) + I (t).


Here Y (t) is the observation and T(t) , S(t) ,C(t) and I (t) are respectively the trend,
seasonal, cyclical and irregular variation at time t.

For monthly data, an additive model assumes that the difference between the January and
July values is approximately the same each year. In other words, the amplitude of the
seasonal effect is the same each year. The model similarly assumes that the residuals are
roughly the same size throughout the series -- they are a random component that adds on
to the other components in the same way at all parts of the series.

86
Multiplicative model is based on the assumption that the four components of a time
series are not necessarily independent and they can affect one another; whereas in the
additive model it is assumed that the four components are independent of each other.

In many time series involving quantities (e.g. money, wheat production,), the absolute
differences in the values are of less interest and importance than the percentage changes.
For example, in seasonal data, it might be more useful to model that the July value is the
same proportion higher than the January value in each year, rather than assuming that
their difference is constant. Assuming that the seasonal and other effects act
proportionally on the series is equivalent to a multiplicative model.

Y(t) = T(t)×S(t)×C(t)×I (t).


Here Y(t) is the observation and T(t) , S(t) ,C(t) and I (t) are respectively the trend,
seasonal, cyclical and irregular variation at time t.

Fortunately, multiplicative models are equally easy to fit to data as additive models! The
trick to fitting a multiplicative model is to take logarithms of both sides of the model,
after taking logarithms (either natural logarithms or to base 10), the four components of
the time series again act additively.

Time series analysis is an analytical technique that is broadly applicable to a wide variety
of domains. In domains in which data are collected at specific (usually equally spaced)
intervals a time series analysis can reveal useful patterns and trends related to time.
Industrial as well as governmental agencies rely on time series analysis for both historical
understanding as well forecasting and predictive modeling. In this week's column, the
basics of the statistical approach to an analysis of time series are presented

87
A time series is mathematically defined as a set of observed values taken at specified
times. The set of values is typically denoted "Y", and the set of times as "t1, t2, t3, ...etc".
In other words, Y is a function of "t", and the goal of a time series analysis is to find a
function that describes the movement of data. It should be noted that a time series is often
graphed, and trends and patterns are visually apparent. The statistical approach essentially
describes visually apparent trends formally and quantitatively.

A time series analysis is often referred to as "time series decomposition". As the phrase
suggests, this means that the time series is decomposed into its component parts. There
are typically four main components and together these components sufficiently describe
the variations of data over time. These four components are (1) the long-term trend, "T";
(2) seasonal variations, "S"; (3) cyclical patterns, "C"; and (4) irregularities or noise, "I".
As an equation, the time series is generally described either as a multiplicative
relationship where,

Y = T x C x S x I, or alternatively as an additive model where


Y = T + C + S + I.
Should I use an additive model or a multiplicative model?
Choose the multiplicative model when the magnitude of the seasonal pattern in the data
depends on the magnitude of the data. The magnitude of the seasonal pattern increases as
the data values increase, and decreases as the data values decrease. Choose the additive
model when the magnitude of the seasonal pattern in the data does not depend on the
magnitude of the data. In other words, the magnitude of the seasonal pattern does not
change as the series goes up or down.

If the pattern in the data is not very obvious, and you have trouble choosing between the
additive and multiplicative procedures, you can try both and choose the one with smaller
accuracy measures.
 The additive model is useful when the seasonal variation is relatively constant
over time.
 The multiplicative model is useful when the seasonal variation increases over
time.

88
89

You might also like