Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 184

Chapter One: General

Introduction
By Tefera M.
Advanced Econometrics, Oromia
State University
Introduction
• What is econometrics
– Simply stated, econometrics means economic
measurement.
– Econometrics may be defined as the social science in
which the tools of economic theory, mathematics, and
statistical inference are applied to the analysis of
economic phenomena.
– Econometrics, the result of a certain outlook on the
role of economics, consists of the application of
mathematical statistics to economic data to lend
empirical support to the models constructed by
mathematical economics and to obtain numerical
results.
Cont.
• Econometrics has to be distinguished from
mathematical economics and from statistical
economics.
• It is, however, closely related to both, and utilizes
results achieved m these fields
• Econometrics is based upon the development of
statistical methods for estimating economic
relationships, testing economic theories, and
evaluating and implementing government and
business policy
cont.
• Econometrics is the application of mathematics, statistical
methods, and, more recently, computer science, to economic
data and is described as the branch of economics that aims to
give empirical content to economic relations.

• Econometrics is the branch of economics concerned with the


empirical estimation of economic relationships, models,
together with data, represent the basic ingredients of any
econometric study.
• Econometrics is the field of economics that concerns the
application of mathematical statistics and the tools of
statistical inference to the empirical measurement of
relationships postulated by economic theory.
The Subject, Goals, Objectives and
tasks of econometrics
Scope of econometrics
• The following are the major scope of
econometrics
– Developing statistical methods for the estimation
of economic relationships,
– Testing economic theories and hypothesis,
– Evaluating and applying economic policies,
– Forecasting,
– Collecting and analyzing non experimental or
observational data.
Components of Econometrics
• Components of Econometrics are the input and output of
econometrics.
• Econometric inputs:
– Economic Theory
– Mathematics
– Statistical
– Theory
– Data and Interpretation
• Econometric outputs:
– Measurement
– Inference
– Hypothesis testing
– Forecasting, Prediction and Evaluation
Types of Econometrics
• Econometrics may be divided into two broad categories:
theoretical econometrics and applied econometrics.
– Theoretical econometrics is concerned with the
development of appropriate methods for measuring
economic relationships specified by econometric models.
In this aspect, econometrics leans heavily on mathematical
statistics. For example, one of the methods used
extensively in this book is least squares.
– In applied econometrics we use the tools of theoretical
econometrics to study some special field(s) of economics
and business, such as the production function, investment
function, demand and supply functions, portfolio theory,
etc.
Steps involved in formulating an
econometric model
• Although there are of course many different ways to go
about the process of model building, a logical and valid
approach would be to follow the steps described below.
• The steps involved in the model construction process are now
listed and described. listed and described.
– Step 1: general statement of the problem This will usually involve the
formulation of a theoretical model, or intuition from financial theory
that two or more variables should be related to one another in a
certain way. The model is unlikely to be able to completely capture
every relevant real-world phenomenon, but it should present a
sufficiently good approximation that it is useful for the purpose at
hand
• Step 2: collection of data relevant to the model The data
required may be available electronically through a financial
information provider. Alternatively, the required data may be
available only via a survey after distributing a set of
questionnaires i.e. primary data.
• Step 3: choice of estimation method relevant to the model
proposed in step 1 For example, is a single equation or
proposed in step 1 For example, is a single equation or
multiple equation technique to be used?
• Step 4: statistical evaluation of the model What assumptions
were required to estimate the parameters of the model
optimally? Were these assumptions satisfied by the data or
the model? Also, does the model adequately describe the
data?
Cont.
• Step 5: evaluation of the model from a theoretical perspective
Are the parameter estimates of the sizes and signs that the
theory or intuition from step 1 suggested?
• Step 6: use of model When a researcher is finally satisfied
with the model, it can then be used for testing satisfied with
the model, it can then be used for testing the theory specified
in step 1, or for formulating forecasts or suggested courses of
action.
• This suggested course of action might be for an individual
(e.g. ‘if inflation and GDP rise, buy stocks in sector X’), or as an
input to government policy (e.g. ‘when equity markets fall,
program trading causes excessive volatility and so should be
banned’).
Steps involved in the formulation of
econometric models

Economic or Financial Theory (Previous Studies)

Formulation of an Estimable Theoretical Model

Collection of Data

Model Estimation

Is the Model Statistically Adequate?

No Yes

Reformulate Model Interpret Model

Use for Analysis


13
Data types
• For empirical purposes, we need quantitative information on
the two or more variables. There are different types of data
• There are three types of data that are generally available for
empirical analysis. – Time series. – Time series. – Cross-
sectional. – Pooled (a combination of time series and cross-
sectional)
• Times series data are collected over a period of time, such as
the data on GDP, employment, unemployment, money
supply, or government deficits. Such data may be collected at
regular intervals—daily (e.g., stock prices), weekly (e.g.,
money supply), monthly (e.g., the unemployment rate),
quarterly (e.g., GDP), or annually (e.g., government budget).
Cont.
• These data may be quantitative in nature (e.g., prices, income, money
supply) or qualitative (e.g., male or female, employed or
unemployed ,married or unmarried, white or black).
• Cross-sectional data are data on one or more variables collected at one
point in time, such as the census of population conducted by the Census
Bureau every 10 years ; the surveys of consumer expenditures conducted
by CSA of Ethiopia; and the opinion poll such as those conducted by
different organizations.
• In pooled data we have elements of both time series and cross sectional
data. For example, if we collect data on the unemployment rate for 10
countries for a period of 20 years, the data will constitute an example of
pooled data—data on the unemployment rate for each country for the
20-year period will form time series data, whereas data on the
unemployment rate for the 10 countries for any single year will be cross-
sectional data.
Cont.
• There is a special type of pooled data, panel data, also called
longitudinal or micro panel data, in which the same cross-
sectional unit, say, a family or firm, is surveyed over time.
• At each periodic survey the same household (or the people
living at the same address) is interviewed to find out if there
living at the same address) is interviewed to find out if there
has been any change in the housing and financial conditions
of that household since the last survey.
• The panel data that result from repeatedly interviewing the
same household at periodic intervals provide very useful
information on the dynamics of household behavior
Continuous and Discrete Data

• Continuous data can take on any value and are not confined to take
specific numbers.
• Their values are limited only by precision.
– For example, the rental yield on a property could be 6.2%,
6.24%, or 6.238%.
• On the other hand, discrete data can only take on certain values,
which are usually integers
– For instance, the number of people in a particular underground
carriage or the number of shares traded during a day.
• They do not necessarily have to be integers (whole numbers)
though, and are often defined to be count numbers.
– For example, until recently when they became ‘decimalised’,
many financial asset prices were quoted to the nearest 1/16 or
1/32 of a dollar.
Review of elementary statistics
• Statistics is the science of planning studies and experiments,
obtaining data, and then organizing, summarizing, presenting,
analyzing, interpreting, and drawing conclusions based on the data
• In other ward there are Two Meanings: Actual numbers and • In
other ward there are Two Meanings: Actual numbers and Methods
of analysis.
– Actual numbers: numerical measurements determined by a set
of data
– Methods of analysis: a collection of methods for planning
experiments, obtaining data, and then analyzing, interpreting,
and drawing conclusions based on the data
Statistics, Parameter and Statistic
• Statistics is a way to get information from data
– Data: Facts, especially numerical facts, collected together
for reference or information.
– Information: Knowledge communicated concerning some
particular fact.
Parameter
• a numerical measurement describing some characteristic of a
population.
Statistic
• a numerical measurement describing some characteristic of a
sample.
Statistical Description of Data
• Statistics describes a numeric set of data by its
– Center
– Variability
– Shape
– Relation or correlation
• Statistics describes a categorical set of data
by frequency, percentage or proportion of
each category
Descriptive and Inferential Statistics
• Descriptive statistics are methods for organizing and
summarizing data. • For example, tables or graphs are used to
organize data, and descriptive values such as the average score
are used to summarize data.
• A descriptive value for a population is called a parameter and a
descriptive value for a sample is called a statistic. A descriptive
value for a sample is called a statistic.
• Inferential statistics are methods for using sample data to make
general conclusions (inferences) about populations.
• Because a sample is typically only a part of the whole
population, sample data provide only limited information about
the population. As a result, sample statistics are generally
imperfect representatives of the corresponding population
parameters.
Levels of Data Measurement

• Data can be classified according to levels of measurement.


The level of measurement of the data often dictates the
calculations that can be done to summarize and present the
data.
• There are actually four levels of measurement:
– Nominal,
– Ordinal,
– Interval, and
– Ratio.
• The lowest, or the most primitive, measurement is the nominal
level. The highest, or the level that gives us the most
information about the observation, is the ratio level of
measurement.
Cont.
• Nominal basically refers to categorically discreet data such as
name of your school, type of car you drive or name of a book.
This one is easy to remember because nominal sounds like
name (they have the same Latin root).
• ordinal variable, is one where the order matters but not the
difference between values. For example, you might ask
patients to express the amount of pain they are feeling on a
scale of 1 to 10. A score of 7 means more pain that a score of
5, and that is more than a score of 3. But the difference
between the 7 and the 5 may not be the same as that between 5
and 3. The values simply express an order.
Cont.
• Interval data is like ordinal except we can say the intervals
between each value are equally split. The most common
example is temperature in degrees Fahrenheit. The difference
between 29 and 30 degrees is the same magnitude as the
difference between 78 and 79 (although I know I prefer the
latter).
• Ratio data is interval data with a natural zero point. For
example, time is ratio since 0 time is meaningful. Degrees
Kelvin has a 0 point (absolute 0) and the steps in both these
scales have the same degree of magnitude.
Table 1: Level of Measurement
Measurement characteristics statistical Implications
scales
Nominal scale measures in terms of names or Enables one to determine the mode, the
designations of discrete units percentage values
or categories
Ordinal scale measures in terms of such Enables one also to determine the
values as “more” or “less,” median, percentile rank, and rank
“larger” or “smaller,” but correlation
without specifying the size of
the intervals
Interval scale measures in terms of equal Enables one also to determine the
intervals or degrees of mean, standard deviation, and product
difference but whose zero moment correlation; allows one to
point, or point of beginning, is conduct most inferential statistical
arbitrarily established analyses
Ratio scale measures in terms of equal Enables one also to determine the
intervals and an absolute zero geometric mean and the percentage
point of origin variation; allows one to conduct
virtually any inferential statistical
analysis characteristics or behaviors in
given situations.
Class Exercise on level of Measurement

S.N Variable/Data Level of measurement


1 Protestant
2 Number of cases
3 Very happy
4 Catholic
5 From 200-800
6 Income in dollars

26
Measure of variables
• The important statistical measures that are used to summaries
the survey/research data are:
– measures of central tendency or statistical averages;
– measures of dispersion;
– measures of asymmetry (skewness);
– measures of relationship; and
– other measures.
Measures of central tendency or
statistical averages
Mean
• The first measure that we examine is the mean.
• The mean is the arithmetic average for a set of data.
• Recall that to find the arithmetic average for a set of values we
add up all the values and then divide that total by how many
values there are. Sample Mean is equal to
x 
x
n
• A manager of a local restaurant is interested in the number of
people who eat there on Fridays. Here are the totals for nine
randomly selected Fridays:
712 626 600 596 655 682 642 532 526

• Find the mean for this set of data.


Median
• The median of a set of data is a value that divides the set of
data into two equal groups, after the values have been put in
order from lowest to highest.
• Where do we find the median of a road? In the center of the
road. Two scenarios could occur when finding a median.
• If there are an odd number of values in the set, there will be
one value in the center of the data, and this value is the
median.
• However, if there are an even number of values in the set of
data, there is not a single value in the center. In this case, we
take the mean of the two values in the center.
Mode

• The is the value in a data set that occurs the most often. If no
such value exists, we say that the data set has no mode.
• If two such values exist, we say the data set is bimodal. If
three such values exist, we say the data set is tri-modal.
• Find the mode for the following data.

25 46 34 45 37 36 40 30 29 37 44 56 50 47 23
40 30 27 38 47 58 22 29 56 40 46 38 19 49 50
Measures of dispersion
• In addition to measures of central tendency, it is desirable to
have numerical values to describe the spread or dispersion of a
data set.
• Measures that describe the spread of a data set are called
measures of dispersion.
• Range, variance, and standard deviation are the major methods
of measuring dispersion.
• The range for a data set is equal to the maximum value in the
data set minus the minimum value in the data set.
• It is clear that the range is reflective of the spread in the data
set since the difference between the largest and the smallest
value is directly related to the spread in the data.
Cont.
• The variance and the standard deviation of a data set measures
the spread of the data about the mean of the data set. The
variance of a sample of size n is represented by s2 and is given
by
 x  x 
2

s 
2

n 1
• Compute the variance for the following data.
s.n x
1 5
2 10
3 15
4 7
5 3
Cont.
• The square root of the variance is called the standard deviation
and the standard deviation is measured in the same units as the
variable.

s s 2
Measures of asymmetry (skewness)
• SKEWNESS describes the degree to which the data deviates
from symmetry.
• When the distribution of the data is not symmetrical, it is said
to be asymmetrical or skewed
• There are three types of skewness(Normal, negative and
posative skewness)
– Symmetrical/Normal Distribution
• Bell shaped distribution
• The mean, median and mode are all located at one
point.
Cont.
– Positively Skewed Distribution
• Observations are mostly concentrated towards the
smaller values and there are some extremely high
values.
• Also called skewed to the right distribution
Cont.
• Negatively Skewed Distribution
– Observations are mostly concentrated towards the larger
values and there are some extremely low values.
– Also called skewed to the left distribution.
Skewness and Central Tendency
• We have already discussed how each measure is affected by
outliers or skewed distribution.
• Let’s consider this information further. In a positively skewed
distribution the outliers will be pulling the mean down the scale a
great deal.
• The median might be slightly lower due to the outlier, but the mode
will be unaffected. Thus, with a negatively skewed distribution the
mean is numerically lower than the median or mode.
• The opposite is true for positively skewed distributions. Here the
outliers are on the high end of the scale and will pull the mean in
that direction a great deal. The median might
• be slightly affected as well, but not the mode. Thus, with a
positively skewed distribution the mean is numerically higher than
the median or the mode.
Figure 1 : Skewness and Central Tendency
Measures of relationship
• The common known method of measures of relation are;
– Covariance
– Correlation
• Covariance is a measure of the degree to which two variable
are linearly related. The covariance can be either positive or
negative implying a direct or inverse relationship respectively.
• Correlation also shows how two random variables are related
to each other. More specifically, it shows the strength of linear
relationship. If the correlation is +1, two random variables have
perfect positive linear relationship. If it is –1, the two random
variables have perfect negative linear relationship. As it
becomes close to zero, the linearity in the relationship weakens.
Chapter Two: Introduction to the
Simple OLS Regression
By Teshome A.(PhD)
Unity University
What is a regression model?
• Regression analysis is almost certainly the most important
tool at the econometrician’s disposal. But what is regression
analysis? In very general terms, regression is concerned
with describing and evaluating the relationship between a
given variable and one or more other variables.
• More specifically, regression is an attempt to explain
movements in a variable by reference to movements in one
or more other variables. To make this more concrete, denote
the variable whose movements the regression seeks to
explain by y and the variables which are used to explain
those variations by x1, x2, . . . , xk .
• Hence, in this relatively simple setup, it would be said that
variations in k variables (the xs) cause changes in some
other variable, y.
Cont.
• In other ward regression is Relation between
variables where changes in some variables may
“explain” or possibly “cause” changes in other
variables.
• Explanatory variables are termed the independent
variables and the variables to be explained are termed
the dependent variables.
• Regression model estimates the nature of the
relationship between the independent and dependent
variables.
– Change in dependent variables that results from changes in
independent variables, ie. size of the relationship.
– Strength of the relationship.
– Statistical significance of the relationship.
Cont.
• Any straight line can be represented by an equation of the
form Y = b2X + b1, where b and a are constants.
• The value of b is called the slope constant and determines the
direction and degree to which the line is tilted.
• The value of a is called the Y-intercept and determines the
point where the line crosses the Y-axis.
• The ability of the regression equation to accurately predict the
Y values is measured by first computing the proportion of the
Y-score variability that is predicted by the regression equation
and the proportion that is not predicted.

43
Simple Linear Regression

Dependent variable (y)


y’ = b0 + b2X ± є
є

b1 = slope
= ∆y/ ∆x
b0 (y intercept)

Independent variable (x)

The output of a regression is a function that predicts the dependent


variable based upon values of the independent variables. Simple
regression fits a straight line to the data.
Linear Regression Model

• 1. Relationship Between Variables Is a


Linear Function
Population Random
Slope Error

Yi   0  1X i   i
Dependent Independent
(Response) (Explanatory) Variable
Variable (e.g., Years s. serocon.)
(e.g., CD+ c.)
The Differences between  and b
About  and  About bo and b1

• They are parameters • They are estimators


• They do not have means, • We use the values of b1 and b2 to
variances or p.d.f.’s draw inferences about  and 
• Their values are unknown • These estimators are formulas that
• There is no “formula” for  and explain how to combine the data
 points in a sample of data to get
the best fitting line.
• They are estimated using b1 and
• They are functions of the data.
b2 and a sample of data on X and
Because the data constitute a
Y random sample  bo and b1 are
random variables (will vary from
sample to sample)
Interpretation of Coefficients
^
• 1. Slope (1)
^
– Estimated Y Changes by 1 for Each 1 Unit
Increase
^ in X
• If 1 = 2, then Y Is Expected to Increase by 2 for Each 1
Unit Increase in X
^
• 2. Y-Intercept (0)
– Average
^ Value of Y When X = 0
• If 0 = 4, then Average Y Is Expected to Be 4
When X Is 0
EPI 809/Spring 2008 47
Examples
• Dependent variable is retail price of gasoline in Regin –
independent variable is the price of crude oil.
• Dependent variable is employment income – independent
variables might be hours of work, education, occupation, sex,
age, region, years of experience, unionization status, etc.
• Price of a product and quantity produced or sold:
– Quantity sold affected by price. Dependent variable is
quantity of product sold – independent variable is price.
– Price affected by quantity offered for sale. Dependent
variable is price – independent variable is quantity sold.
Uses of regression
• The following are the major use of regression.
– Amount of change in a dependent variable that results from
changes in the independent variable(s) – can be used to
estimate elasticities, returns on investment in human
capital, etc.
– Attempt to determine causes of phenomena.
– Prediction and forecasting of sales, economic growth, etc.
– Support or negate theoretical model.
– Modify and improve theoretical models and explanations
of phenomena.
Correlation and regression
• All readers will be aware of the notion and definition of correlation.
The correlation between two variables measures the degree of
linear association between them.
• If it is stated that y and x are correlated, it means that y and x are
being treated in a completely symmetrical way. Thus, it is not
implied that changes in x cause changes in y, or indeed that changes
in y cause changes in x. Rather, it is simply stated that there is
evidence for a linear relationship between the two variables, and
that movements in the two are on average related to an extent
given by the correlation coefficient.
• In regression, the dependent variable (y) and the independent
variable(s) (xs) are treated very differently. The y variable is
assumed to be random or ‘stochastic’ in some way, i.e. to have a
probability distribution. The x variables are, however, assumed to
have fixed (‘non-stochastic’) values in repeated samples.
• Regression as a tool is more flexible and more powerful than
correlation.
Regression Modeling Steps
• The fooling are the major steps in modeling
regression
– Hypothesize Deterministic Component
• Estimate Unknown Parameters
– Specify Probability Distribution of Random
Error Term
• Estimate Standard Deviation of Error
– Evaluate the fitted Model
– Use Model for Prediction & Estimation
Derivation of the Ordinary Least Squares
Estimator
• As briefly discussed above the most commonly used estimation
procedure is the minimization of the sum of squared deviations.
• This procedure is known as the ordinary least squares (OLS)
estimator. In this sub section, this estimator is derived for the
simple linear case. The simple linear case means only one x
variable is associated with each y value.
• The simple linear regression problem is given by the following
equation
y i  a  bx i  u i i  1, 2,  , n
• Where yi and xi represent paired observations, a and b are
unknown parameters, ui is the error term associated with
observation i, and n is the total number of observations
Cont.
• To minimize the sum of squared residuals, it is necessary to
obtain for each observation. Estimated residuals are
calculated using the estimated values for a and b. First, using
the estimated equation , (recall the hat denotes estimated
values) an estimated value for y is obtained for each
observation. Unfortunately, at this point we do not have
values for . As noted earlier, residuals are calculated as . At
this point, we have values for all yi’s.
• The sum of squared residuals can be written mathematically as

2
û 1  û 2
2  û 3
2
   û 2
n
n
  û
i 1
2
i
Cont.
• where n is the total number of observations and ∑ is the
summation operator. The above equation is known as the sum
of squared residuals (sum of squared errors) and denoted SSE.
Using the definitions of and , the SSE becomes
n n n n
SSE   û   ( y i  ŷ i )   [ y i  (â  b̂x i )]   ( y i  â  b̂x i ) 2
2
i
2 2

i 1 i 1 i 1 i 1

• An equation of the objective function for the OLS estimation


procedure is:
n
min SSE   ( y  â  b̂x ) i i
2

i 1

• with respect to (w.r.t.) â and b̂


Least Squares Graphically
n
LS minimizes U
i 1
i
2
 U 12  U 22  U 32  U 42

Y Y2   0   1X 2   2
^44
^22
^11 ^33
  
Yi   0   1X i
X
EPI 809/Spring 2008 55
Cont.
n n
SSR   [2 ( y i  â  b̂x i ) (1)]    2( y i  â  b̂x i ) 0, and
â i 1 i 1

SSR n n
b̂   [2 ( y i  â  b̂x i ) (  x i )]    2 x i ( y i  â  b̂x i ) 0.
i 1 i 1

â  y  b̂x
n n n
n x i yi   x i  yi
b̂  i 1
n
i 1
n
i 1

n  x i2  ( x i ) 2
i 1 i 1
Assumptions of the OLS Estimator
• In this sub section, five assumptions that necessary to derive
and use the OLS estimator are presented.
• The next section will summarize the need for each assumption
in the derivation and use of the OLS estimator. You will need
to know and understand these five assumptions and their use.
Assumption one - Linear in Parameters
• This assumption has been discussed in both the simple linear
and multiple regression derivations and presented above as a
trait. Specifically, the assumption is
– the dependent variable y can be calculated as a linear function of a
specific set of independent variables plus an error term.
Cont.
• The regression model:
– A) is linear
• It can be written as
Yi   0  1 X 1i   i
• This doesn’t mean that the theory must be linear
• For example… suppose we believe that CEO salary is related to
the firm’s sales
• We might believe the model is:
log(salary) i   0  1 log(salesi )   i
Cont.
• Assumption two - Random Sample of n Observations
• This assumption is composed of three related sub-assumptions.
– Assumption A. The sample consists of n-paired observations that are
drawn randomly from the population.
– Assumption B. The number of observations is greater than the number of
parameters to be estimated, usually written n > k. As discussed earlier, if
n = k, the number of observations (equations) will equal the number of
unknowns. In this case, OLS is not necessary, algebraic procedures can be
used to derive the estimates. If n < k, the number of observations is less
than the number of unknowns. In this case, neither algebra nor OLS
provide unique estimates.
– Assumption C. The independent variables (x’s) are nonstochastic, whose
values are fixed. This assumption means there is a unilateral causal
relationship between dependent variable, y, and the independent variables,
x’s. Variations in the x’s cause variations (changes) in the y’s; the x’s
cause y.
Cont.
• Assumption three– Zero Conditional Mean
• The mean of the error terms has an expected value of zero
given values for the independent variables.
• In mathematical notation, this assumption is correctly written
as E (U | X )  0 A shorthand notation is often employed and will
be used in this class of the following .E (U )  0 Here, E is the
expectation operator, U the matrix of error terms, and X the
matrix of independent variables.
Cont.
• Assumption Four – No Perfect Collinearity
• The assumption of no perfect collinearity states that there is
no exact linear relationship among the independent variables.
This assumption implies two aspects of the data on the
independent variables. First, none of the independent
variables, other than the variable associated with the intercept
term (recall x1=1 regardless of the observation), can be a
constant. Variation in the x’s is necessary. In general, the
more variation in the independent variables the better the OLS
estimates well be in terms of identifying the impacts of the
different independent variables on the dependent variable.
Cont.
• Important!!
• All explanatory variables are uncorrelated with the error term
• E(εi|X1i,X2i,…, XKi,)=0
• Explanatory variables are determined outside of the model (They
are exogenous)
• What happens if this assumption is violated?
• Suppose we have the model,
• Yi =β0+ β1Xi+εi
• Suppose Xi and εi are positively correlated
• When Xi is large, εi tends to be large as well.
Cont.

Assumption five: No Serial Correlation


• Serial Correlation: The error terms across
observations are correlated with each other
• i.e. ε1 is correlated with ε2, etc.
• This is most important in time series
• If errors are serially correlated, an increase in the
error term in one time period affects the error term in
the next.
• The assumption that there is no serial correlation can
be unrealistic in time series
cont
• Assumption Six – Homoskedasticity
• The error terms all have the same variance and are not
correlated with each other. In statistical jargon, the error
terms are independent and identically distributed (iid). This
assumption means the error terms associated with different
observations are not related to each other. Mathematically,
this assumption is written as:
var(u i | X )   2 and
cov(u i u j | X )  0 for i  j
• This assumption is more commonly written:
var(u i )   2 and
cov(u i u j )  0 for i  j.
Homoskedasticity: The error has constant variance
Heteroskedasticity: Spread of error depends on X.
Another form of Heteroskedasticity
Properties of OLS estimators
• The outcome of least squares method is OLS
parameter estimators a and b.
– OLS estimators are linear
– OLS estimators are unbiased (precise)
– OLS estimators are efficient (small variance)
• Gauss-Markov Theorem: Among linear
unbiased estimators, least square estimator
(OLS estimator) has minimum variance.
BLUE (best linear unbiased estimator)
Gauss-Markov Theorem
• With Assumptions 1-6 OLS is:
– 1. Unbiased: E ( ˆ )  
– 2. Minimum Variance – the sampling distribution is as
small as possible
– 3. Consistent – as n∞, the estimators converge to the
true parameters
• As n increases, variance gets smaller, so each estimate approaches
the true value of β.
– 4. Normally Distributed. You can apply statistical tests to
them.
Goodness-of-fit
• Goodness-of-fit measures evaluates how well a
regression model fits the data
• The smaller SSE, the better fit the model
• F test examines if all parameters are zero. (large F
and small p-value indicate good fit)
• R2 (Coefficient of Determination) is SSM/SST that
measures how much a model explains the overall
variance of Y.
• R2=SSM/SST=522.2/649.5=.80
• Large R square means the model fits the data
Myth and Misunderstanding in R2
• R square is Karl Pearson correlation coefficient squared.
r2=.89672=.80
• If a regression model includes many regressors, R2 is less
useful, if not useless.
• Addition of any regressor always increases R2 regardless of
the relevance of the regressor
• Adjusted R2 give penalty for adding regressors, Adj. R2=1-
[(N-1)/(N-K)](1-R2)
• R2 is not a panacea although its interpretation is intuitive; if
the intercept is omitted, R2 is incorrect.
• Check specification, F, SSE, and individual parameter
estimators to evaluate your model; A model with smaller R 2
can be better in some cases.
Chapter Three: Introduction to
the Multiple OLS Regression
By Teshome A.
Multiple Regression Analysis (MRA)
• In Chapter two, we learned how to use simple regression analysis to
explain a dependent variable, y, as a function of a single
independent variable, x. The primary drawback in using simple
regression analysis for empirical work is that it is very difficult to
draw ceteris paribus conclusions about how x affects y: the key
assumption, SLR—that all other factors affecting y are uncorrelated
with x—is often unrealistic.
• Multiple regression analysis is more amenable to ceteris paribus
analysis because it allows us to explicitly control for many other
factors which simultaneously affect the dependent variable. This is
important both for testing financial economic theories and for
evaluating policy effects when we must rely on non experimental
data. Because multiple regression models can accommodate many
explanatory variables that may be correlated, we can hope to infer
causality in cases where simple regression analysis would be
misleading.
Cont.
• Multiple regression analysis is an extension of simple regression
analysis to cover cases in which the dependent variable is
hypothesized to depend on more than one explanatory variable.
Much of the analysis will be a straightforward extension of the
simple regression model, but we will encounter two new problems.
• First, when evaluating the influence of a given explanatory variable
on the dependent variable, we now have to face the problem of
discriminating between its effects and the effects of the other
explanatory variables.
• Second, we shall have to tackle the problem of model specification.
Frequently a number of variables might be thought to influence the
behavior of the dependent variable; however, they might be
irrelevant. We shall have to decide which should be included in the
regression equation and which should be excluded.
Cont.
• Multiple regression also may be useful
– in determining whether or not a particular effect is present;
– in measuring the magnitude of a particular effect; and
– in forecasting what a particular effect would be, but for an intervening
event.
• The purpose of multiple regression
– Incorporating more than one independent variable into the
explanation of a dependent variable
– Measuring the cumulative impact of independent variables on a
dependent variable
– Determining the relative importance of independent variables
Cont.

Idea: Examine the linear relationship between


1 dependent (Y) & 2 or more independent variables (Xi)
Multiple Regression Model with k Independent Variables:

Y-intercept Population slopes Random Error

Yi  β0  β1 X 1i  β2 X 2i    βk X ki  ε
• The coefficients of the multiple regression model
are estimated using sample data with k
independent variables
Estimated Estimated
(or predicted) Estimated slope coefficients
intercept
value of Y

Ŷi  b0  b1 X 1i  b2 X 2i    bk X ki
• Interpretation of the Slopes: (referred to as a Net
Regression Coefficient)
– b1=The change in the mean of Y per unit change in X1,
taking into account the effect of X2 (or net of X2)
– b0 Y intercept. It is the same as simple regression.
Assumptions of multiple regression
• The following are the major assumption of multiple regression
– Independence: the scores of any particular subject are
independent of the scores of all other subjects
– Normality: in the population, the scores on the dependent
variable are normally distributed for each of the possible
combinations of the level of the X variables; each of the
variables is normally distributed
– Homoscedasticity: in the population, the variances of the
dependent variable for each of the possible combinations of
the levels of the X variables are equal.
– Linearity: In the population, the relation between the
dependent variable and the independent variable is linear
when all the other independent variables are held constant.
Simple vs. Multiple Regression

• One dependent variable Y • One dependent variable Y


predicted from one predicted from a set of
independent variable X independent variables (X1,
X2 ….Xk)
• One regression coefficient • One regression coefficient
for each independent
variable
• r2: proportion of variation in • R2: proportion of variation in
dependent variable Y dependent variable Y
predictable from X predictable by set of
independent variables (X’s)
Simple and Multiple Regression
Compared
• Coefficients in a simple regression pick up the
impact of that variable (plus the impacts of
other variables that are correlated with it) and
the dependent variable.
• Coefficients in a multiple regression account
for the impacts of the other variables in the
equation.
The simple linear regression model Note how the straight line
allows for one independent variable, “x” becomes a plain, and...
y =0 + 1x + 

y
y

X1
X

The multiple linear regression model


allows for more than one independent variable.
X2 Y = b 0 + b 1 x1 + b 2 x2 + e
Multiple Regression Models

M
Multiple
ultiple
Regression
Regression
M
Models
odels Non-
Non-
Linear
Linear Linear
Linear

Dum
Dummymy Inter-
Inter-
Linear
Linear action
Variable
Variable action

Poly-
Poly- Square
Square Log
Log Reciprocal
Reciprocal Exponential
Exponential
Nom
Nomial
ial Root
Root
Interpreting the Regression Coefficients
• The regression coefficients are interpreted essentially the same
in multiple regression as they are in simple regression, with
one caveat.
• The slope of an independent variable in multiple regression is
usually interpreted as the marginal (or isolated) effect of a unit
change in the variable upon the mean value of Y when “the
values of all of the other independent variables are held
constant”.
• Let us assume that the sale of particular product affected by
advertising and bonus payments. As you can see in the next
table the change in the bonus and advertising spending affect
the total sale of the particular product.
Dependent variable: Sales
----------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic
----------------------------------------------------------------
CONSTANT -515.073 190.759 -2.70013
Ad 2.47216 0.275644 8.96869
Bonus 1.85284 0.717485 2.5824
----------------------------------------------------------------

b0 estimates the expected annual sales for a territory if $0.00 is spent on
advertising and bonuses. Because these values are outside the range of
values for Ad and Bonus observed, and upon which the estimated
regression equation is based, the value of b0 has no practicle
interpretation. Put more concisely, an interpretation of b 0 is not supported
by the data. This will often, but not always, be the case in multiple
regression.
b1: Expected (mean) sales increases by about $2,472 for every $100
increase in the amount spent on advertizing, holding the amount of
bonuses paid constant.
b2: Sales increases by $1,853, on average, for every $100 increase in
bonuses, for a given amount spent on advertizing
Major Types Of Multiple Regression
• There are a number of different types of multiple regression
analyses that you can use, depending on the nature of the
question you wish to address.
• The three main types of multiple regression analyses are:
– standard or simultaneous
– hierarchical or sequential
– stepwise.
Cont.
• Standard multiple regression: In standard multiple
regression, all the independent (or predictor) variables are
entered into the equation simultaneously. Each independent
variable is evaluated in terms of its predictive power, over and
above that offered by all the other independent variables.
• This is the most commonly used multiple regression analysis.
You would use this approach if you had a set of variables (e.g.
various personality scales) and wanted to know how much
variance in a dependent variable they were able to explain as a
group or block.
• This approach would also tell you how much unique variance
in the dependent variable each of the independent variables
explained.
Cont.
• Hierarchical multiple regression
• In hierarchical regression (also called sequential regression), the
independent variables are entered into the equation in the order
specified by the researcher based on theoretical grounds.
Variables or sets of variables are entered in steps (or blocks),
with each independent variable being assessed in terms of what it
adds to the prediction of the dependent variable after the previous
variables have been controlled for.
• For example, if you wanted to know how well business profit
afftected by market interest rate , after the effect of intersst rate
is controlled for, you would enter the prices of good and srvices.
• Once all sets of variables are entered, the overall model is
assessed in terms of its ability to predict the dependent measure.
Cont.
• Stepwise multiple regression
• In stepwise regression, the researcher provides a list of
independent variables and then allows the program to select
which variables it will enter and in which order they go into
the equation, based on a set of statistical criteria.
• There are three different versions of this approach: forward
selection, backward deletion and stepwise regression. There
are a number of problems with these approaches and some
controversy in the literature concerning their use (and abuse).
• It is important that you understand what is involved, how to
choose the appropriate variables and how to interpret the
output that you receive.
Derivation Multiple Regression
Coefficients
• The intention of this discussion is to press
home two basic points.
– First, the principles behind the derivation of the
regression coefficients are the same for multiple
regression as for simple regression.
– Second, the expressions, however, are different,
and so you should not try to use expressions
derived for simple regression in a multiple
regression context.
Cont.
• The definition and interpretation of the sums of squares in
multiple regression is similar to that in simple regression.
n
• Total Sum of Squares, SST    y  y  , is a measure of the total
i 1
i
2

variation of Y
• Regression Sum of Squares,SSR    yˆ  y  i , measures the
2

variation of Y explained by the model


n
• Error Sum of Squares, SSE    y  yˆ 
i 1
i , measures the variation of
i
2

Y left unexplained by the model


• We find that the equality SST = SSR + SSE always holds.
Thus, the total variation in Y can be “decomposed” into the
explained variation plus the unexplained variation for the
model.
Errors Terms in a Multiple Regression Equation
Errors: ei  Yi  1   2 X i   3 X 2,i
E ei   0 ; ei ~ N 0,  2 

Errors squares function S(1, 2, 3)


S   ei2   Yi  1   2 X 1,i   3 X 2,i  ;  e   0
2
2
i
i i
In case of K explanatory variables
S   ei2   Yi  1   2 X 1,i  ...   k X k ,i 
2

i i

Problem What are  1 ,  2 and  k that minimise the


sum squared errors?

91
Derivation of Normal Equations in a Multiple
Regression Analysis
S
 2 Yi  1   2 X 1,i   3 X 2,i  1  0 and
1
S
 2 Yi  1   2 X 1,i   3 X 2,i  X 1,i   0
 2
S
 2 Yi  1   2 X 1,i   3 X 2,i  X 2,i   0
 3
Thus normal equations are
Yi  N1   2  X1,i  3  X 2,i
i i i
(2)
 1,i i 1  1,i 2  1,i  3  X1,i X 2,i
X
i
Y   X  
i
X 2

i i
(3)

 2,i i 1  2,i 2  1,i 2,i 3  2,i


X
i
Y   X   X
i
X   X 2

i i
(4)

92
ˆ1  Y  ˆ2 X 1  ˆ3 X 2          4

x x
1,i 2 ,i x 2 ,i yi   x1,i yi  x 22 ,i
ˆ2  i i
2
i i
     5
 
  x1,i x2,i    x 22 ,i  x12,i
 i  i i

x 1,i x2,i  x1,i yi   x2,i yi  x12,i


ˆ3  i i
2
i i
6
 
  x1,i x2,i    x 22 ,i  x 12,i
 i  i i
Coefficient of
Multiple Determination
• Proportion of total variation in Y explained by
all X Variables taken together.

SSR Explained Variation


2
rY .12... k  
SST Total Variation

• Never decreases when a new X variable is


added to model.
– Disadvantage when comparing models.
Cont.
• The R-square statistic measures the regression model’s
usefulness in predicting outcomes
• It indicating how much of the dependent variable’s
variation is due to its relationship with the independent
variable(s).
• An R-square of 1 means that the independent variable
explains 100% of the dependent variable’s variation—it
entirely determines its values.
• Conversely, an R-square of 0 means that the
independent variable explains none of the variation in
the dependent variable—it has no explanatory power
whatsoever.
Coefficient of Partial Determination
• Measures proportion of the variation in Y that is
explained by X2, out of the variation not explained by X1
• Square of the partial correlation between Y and X2,
controlling for X1.

R r
2 2
YX 1
r 2
YX 2  X 1  0  rYX2 2  X 1  1
1 r 2
YX 1

• where R2 is the coefficient of determination for model with both X1 and


X2: R2 = SSR(X1,X2) / TSS
Explaining Variation: How much?
Predictable variation
by the combination of
independent
Total Variation in variables
Y

Unpredictable
Variation
Proportion of Predictable and
Unpredictable Variation
(1-R2) = Unpredictable
(unexplained)
variation in Y
Y
X1

R2 = Predictable
X2
(explained)
variation in Y
Standard Error of Estimate
• Measures the standard deviation of the
residuals about the regression plane, and thus
specifies the amount of error incurred when
the least squares regression equation is used to
predict values of the dependent variable.
• The standard error of estimate is computed by
using the following equation:
SSE
se 
n  k 1
Testing significance of multiple regression
• In this section we show how to conduct significance tests for
multiple regression models.
• In multiple regression, the t test and the F test are the common
test of the test of the significance of multiple regression. But
both t and F test have different purposes.
• The F test is used to determine whether there exists a
significant relationship between the dependent variable and
the entire set of independent variables in the model; thus the F
test is a test of the overall significance of the regression.
• If the F test shows that the regression has overall significance,
the t test is then used to determine whether each of the
individual independent variables is significant. A separate t
test is used for each of the independent variables; thus the t
test is a test for individual significance.
Various Significance Tests
• The motivation behind the F distribution is where we have
independent samples of 2 variables each drawn from
normal distributions
• Testing R2
– Test R2 through an F test
– Test of competing models (difference between R2)
through an F test of difference of R2s
• Testing b
– Test of each partial regression coefficient (b) by t-tests
– Comparison of partial regression coefficients with each
other - t-test of difference between standardized
partial regression coefficients ()
Cont.
• The procedure for estimation is as follows:
1. Estimate the unrestricted version of the model
2. Estimate the restricted version of the model
2
3. Collect  for the unrestricted
ê model and  2*

F

 2* 2
eˆ   eˆ q ê

for the restricted model  


e
ˆ 2

n  k 
4. Compute the F-test
where q is the number of restrictions (in this
case q = 1) and (n-k) is the degrees of freedom for
the unrestricted model 102
Computing F by using R square
• We can also use R2 to calculate the F-
statistic by first dividing through by the
total sum of squares
• Using our definition of R2 we can write:

F
1  R  1  R  q R  R  q
2* 2

2 2*

1  R  n  k
2
1  R  n  k
2

103
Relationship between R & F 2

• When R2 = 0 there is no relationship between the Y and X


variables
– This can be written as Y = a
– In this instance, we cannot reject the null and F = 0
• When R2 = 1, all variation in Y is explained by the X variables
– The F statistic approaches infinity as the denominator
would equal zero
– In this instance, we always reject the null
Multiple regression with dummy variable
• In previous section , the dependent and independent
variables in our multiple regression models have had
quantitative meaning. Just a few examples include hourly
wage rate, years of education, college grade point average,
amount of air pollution, level of firm sales, and number of
arrests.
• In each case, the magnitude of the variable conveys useful
information. In empirical work, we must also incorporate
qualitative factors into regression models.
• The gender or race of an individual, the industry of a firm
(manufacturing, retail, etc.), and the region in the United
States where a city is located (south, north, west, etc.) are all
considered to be qualitative factors.
Cont.
• A Dummy variable or Indicator Variable is an artificial
variable created to represent an attribute with two or more
distinct categories/levels.
• Dummy variables are simply variables that have been coded
either 0 or 1 to indicate that an observation falls into a certain
category.
• They are also sometimes called indicator variables. We use
dummy variables in order to include nominal level variables in
a regression analysis.
• Assume the financial system in the given country affected by
the gender, location, race, political stability in the economy
and other categorical variables.
Cont.
• The Nature of Qualitative Information
– Sometimes we can not obtain a set of numerical
values for all the variables we want to use in a
model.
– This is because some variables can not be
quantified easily.
Examples:
 Gender may play a role in determining salary levels
 Different ethnic groups may follow different consumption patterns
 Educational levels can affect earnings from employment
Cont.
• Consider the following cross-sectional model:
Yi=β1+β2X2i+ ui
• The constant term in this equation measures
the mean value of Yi when X2i is equal to zero.
• This model assumes that the constant will be
the same for all the observations in our data
set.
• But what if we have two different subgroups
(male, female for example)?
Cont.
• The question here is how to quantify the
information that comes from the difference in
the two groups.
• One solution is to create a dummy variable as
follows:
1 for female
D =
0 for male
• Note that the choice of which of the two
different outcomes is to be assigned the value
of 1 does not alter the results.
Cont.
• In this example we create dummy variables for Gender, and
Educ Lev.
• Then we can run a regression analysis with Salary as the
response variable, using any combination of numerical and
dummy explanatory variables.
• We must follow two rules:
– We shouldn’t use any of the original categorical variables
that the dummies are based on.
– We should use one less dummy than the number of
categories for any categorical variable.
Cont.
• The corresponding equation is
Predicted Salary = 35.492 + 0.998YrsExper + 0.131YrsPrior - 8.080Female
• We interpret the coefficient -8.080 of the Female dummy
variable as the average salary disadvantage for females
relative to males after controlling for job experience. But there
is still more story to tell.
• The interpretation indicate that the possibility of being female
reduce the wage by 8.08. That indicate the negative
relationship between female and salary.
• The main conclusion we can draw from the output is that there
is still a plausible case to be made for discrimination against
females, even after including information on all the variables
in the database in the regression equation.
Multiple regression with Interaction
variables
• In general, two predictors interact if the effect on the response
variable of one predictor depends on the value of the other.
• An interaction variable algebraically is the product of two
variables. Its effect is to allow the effect of one of the
variables on Y to depend on the value of the other variable.
• A slope parameter can no longer be interpreted as the change
in the mean response for each unit increase in the predictor,
while the other predictors are held constant.
• We can multiply any 2 explanatory together. The computer
does not care where X3 came from. Mathematically, it doesn’t
matter if
X 3  X1 ·X 2
Cont.
• Suppose, in our earnings equation example, that we believe
the effect of a year of education matters more for
inexperienced workers than for experienced workers.
• We might speculate that for a worker in her first year on the
job, education could substitute for experience. However,
workers with 10 years of experience have had more time for
on-the-job learning, so education may be relatively
unimportant.
• Alternatively, education might proxy for the type of job a
worker gets. Workers with only a few years of education
might tend to be custodial workers, with relatively flat
earnings profiles over time.
Cont.
• To test these hypotheses, we need to create a new variable,
(experience)i·(education)i
• Let Edi = years of education
• Expi = years of work experience
• Ed_Expi = Edi·Expi
log(earnings)i  0  1 Edi  2 Expi  3 Ed _ Expi   i

• The coefficient of (Ed_Exp)the education and experience


explain the join effect of the two variable.
• In a dynamic model the variable of interest may be influenced
by the value of the same variable in the previous time period.
• In these cases the explanatory variable determines the
dependent variable with a lag.
Multiple regression with lagged variable
• In a dynamic model the variable of interest may be influenced
by the value of the same variable in the previous time period.
• In these cases the explanatory variable determines the
dependent variable with a lag.
• If the regression model includes not only the current but also
the lagged (past) values of the explanatory variables (the X’s)
it is called a distributed-lag model.
• If the model includes one or more lagged values of the
dependent variable among its explanatory variables, it is called
an autoregressive model.
• Definition of lag:
– Lapse of time
– It takes time for Y to respond for change in X
Reasons for lags
Cont.
• Reasons to lag the variable: This mainly relates to financial
economics models
– Financial economic events such as price, interest rate,
consumer expenditure, production, or investment
– For instance: consumer expenditure this year may be
related to consumer expenditure last year
• In a general distributed lag model:
Yt = a + g1Yt-1 +…+ g2Yt-p + b0Xt + b1Xt-1 +…+bkXt-q + et
– where p and q = lag length
– can eliminate coefficients by using a t-test (or joint test
using F).
Chapter Four: Issues of OLS
Regression Analysis
By Teshome A.(PhD)
Unity University
Introduction
• In the pervious chapter we considered at length the classical
normal linear regression model and showed how it can be used
to handle the twin problems of statistical inference, namely,
estimation and hypothesis testing, as well as the problem of
prediction.
• But recall that this model is based on several simplifying
assumptions, which are
• the regression model is linear in the parameters, the
values of the regressors,
• the X’s, are fixed, or X values are independent of the
error term,
• for given X’s, the mean value of disturbance ui is zero,
• for given X’s, the variance of ui is constant or
homoscedastic,
Cont.
• for given X’s, there is no autocorrelation, or serial
correlation, between the disturbances,
• the number of observations n must be greater than the
number of parameters to be estimated.
• There is no exact collinearity between the X variables.
• The model is correctly specified, so there is no specification
bias.
• The stochastic (disturbance) term ui is normally distributed.
• In this chapter we focus on the three major issues
which violate the above assumption of OLS
estimation: Hetorscdasticy, autocorrelation and
multicollinarity
Heteroscedasticity
• Heteroscedasticity occurs when the error variance has non-
constant variance.
• In this case, we can think of the disturbance for each
observation as being drawn from a different distribution with a
different variance.
• Stated equivalently, the variance of the observed value of the
dependent variable around the regression line is non-constant.
• We can think of each observed value of the dependent variable
as being drawn from a different conditional probability
distribution with a different conditional variance.
• A general linear regression model with the assumption of
heteroscedasticity can be expressed as follows

Yt = 1 + 2 Xt2 + … + k Xtk + t
Var(t) = E(t2) = t2 for t = 1, 2, …, n
Reasons for hetroscadasticity
• There are several reasons why the variances of errer terms(ui)
may be variable, some of which are as follows.
– Following the error-learning models, as people learn, their
errors of behavior become smaller over time or the number
of errors becomes more consistent.
– As data collecting techniques improve, the variance is
likely to decrease
– Heteroscedasticity can also arise as a result of the presence
of outliers.
– Another source of heteroscedasticity is skewness in the
distribution of one or more repressors included in the
model.
Consequences of heteroscedasticity

• If the error term has non-constant variance, but all


other assumptions of the classical linear
regression model are satisfied, then the
consequences of using the OLS estimator to
obtain estimates of the population parameters are:
– The OLS estimator is still unbiased.
– The OLS estimator is inefficient; that is, it is not
BLUE.
– The estimated variances and covariances of the OLS
estimates are biased and inconsistent.
– Hypothesis tests are not valid.
Detection of heteroscedasticity
• There are several ways to use the sample data to detect the
existence of heteroscedasticity
• The following are the major methods to detect the existence of
hetroscadastcity.
– Breusch-Pagan Test
– Harvey-Godfrey Test
– White’s Test
• Al the methods of detecting the hetroscadastcity use their own
hypotetical assumption to verify the existence of the non
constant variance of the error terms.
Breusch-Pagan and Harvey-Godfrey
Test
• There are a set of heteroscedasticity tests that require an assumption
about the structure of the heteroscedasticity, if it exists.
• That is, to use these tests you must choose a specific functional form
for the relationship between the error variance and the variables that
you believe determine the error variance.
• The major difference between these tests is the functional form that
each test assumes.
• Two of these tests are the Breusch-Pagan test and the Harvey-
Godfrey Test.
• The Breusch-Pagan test assumes the error variance is a linear
function of one or more variables.
• The Harvey-Godfrey Test assumes the error variance is an
exponential function of one or more variables. The variables are
usually assumed to be one or more of the explanatory variables in
the regression equation.
Cont.
• Suppose that the regression model is given by
Y t = 1 + 2 X t +  t for t = 1, 2, …, n
• Suppose that we assume that the error variance is related to the
explanatory variable Xt. The Breusch-Pagan test assumes that the
error variance is a linear function of Xt. We can write this as
follows.
t2 = 1 + 2Xt for t = 1, 2, …, n
• The Harvey-Godfrey test assumes that the error
variance is an exponential function of X3. This
can be written as follows
t2 = exp(1 + 2Xt)
Cont.
• These two heteroscedasticity tests have two major
shortcomings:
– You must specify a model of what you believe is the
structure of the heteroscedasticity, if it exists. For
example, the Breusch-Pagan test assumes that the error
variance is a linear function of one or more of the
explanatory variables, if heteroscedasticity exists.
Thus, if heteroscedasticity exists, but the error
variance is a non-linear function of one or more
explanatory variables, then this test will not be valid.
– If the errors are not normally distributed, then these
tests may not be valid.
White’s Test
• The White test is a general test for heteroscedasticity. It has the following
advantages:
– It does not require you to specify a model of the structure of the
heteroscedasticity, if it exists.
– It does not depend on the assumption that the errors are normally
distributed.
– It specifically tests if the presence of heteroscedasticity causes the OLS
formula for the variances and the covariances of the estimates to be
incorrect.
• Suppose that the regression model is given by
Yt = 1 + 2Xt2 + 3Xt3 + t for t = 1, 2, …, n
• It postulate that all of the assumptions of classical linear regression model
are satisfied, except for the assumption of constant error variance. For the
White test, assume the error variance has the following general structure.

t2 = 1 + 2Xt2 + 3Xt3 + 4X2t2 + 5X2t3 + 6Xt2Xt3 for t = 1, 2, …, n


Cont.
• The following points should be noted about the White Test.
– If one or more of the X’s are dummy variables, then you must be
careful when specifying the auxiliary regression. For example,
suppose the X3 is a dummy variable. In this case, the variable X 23
is the same as the variable X3. If you include both of these in the
auxiliary regression, then you will have perfect multicollinearity.
Therefore, you should exclude X23 from the auxiliary regression.
– If you have a large number of explanatory variables in the
model, the number of explanatory variables in the auxiliary
regression could exceed the number of observations. In this
case, you must exclude some variables from the auxiliary
regression. You could exclude the linear terms, and/or the cross-
product terms; however, you should always keep the squared
terms in the auxiliary regression.
Remedies for heteroscedasticity
• Suppose that we find evidence of heteroscedasticity. If we use
the OLS estimator, we will get unbiased but inefficient
estimates of the parameters of the model.
• Also, the estimates of the variances and covariances of the
parameter estimates will be biased and inconsistent, and as a
result hypothesis tests will not be valid. When there is
evidence of heteroscedasticity, econometricians do one of two
things.
– Use to OLS estimator to estimate the parameters of the
model. Correct the estimates of the variances and
covariances of the OLS estimates so that they are
consistent.
– Use an estimator other than the OLS estimator to estimate
the parameters of the model.
Multicollinearity
• Multicollinearity exists among the X variables in the regression
equation when two (or more) of your X variables are LINEARLY
related with one another.
• Recall that one of the assumptions of the OLS method is that the X
variables in a regression equation are NOT linearly correlated with
one another, so Multicollinearity is a violation of one of the
assumptions of the OLS method.
• When this assumption is violated, serious consequences can occur
for regression analysis.
• Now, in theory, multicollinearity should never arise, because we are
supposed to choose X variables for our model that are not linearly
related with one another.
• Some degree of multicollinearity is always present in observational
data. However, when the predictor variables are strongly
correlated serious problems are likely to be encountered.
Sources of multicollinarity
• There are several sources of multicollinearity. The following are
the major sources of multcollinearity may be due to the following
factors:
– The data collection method employed. For example, sampling over a
limited range of the values taken by the regressors in the population.
– Constraints on the model or in the population being sampled. For
example, in the regression of electricity consumption on income (X2) and
house size (X3) there is a physical constraint in the population in that
families with higher incomes generally have larger homes than families
with lower incomes.
– Model specification. For example, adding polynomial terms to a
regression model, especially when the range of the X variable is small.
– An overdetermined model. This happens when the model has more
explanatory variables than the number of observations. This could happen
in medical research where there may be a small number of patients about
whom information is collected on a large number of variables.
Cont.
• An additional reason for multicollinearity, especially in time
series data, may be that the regressors included in the model
share a common trend, that is, they all increase or decrease
over time.
• Thus, in the regression of consumption expenditure on
income, wealth, and population, the regressors income, wealth,
and population may all be growing over time at more or less
the same rate, leading to collinearity among these variables.
Informal diagnostics of multicollinearity
• The following observations should alert you to severe
multicollinearity.
– Large changes in the estimated regression coefficients as
variables are added or deleted.
– Non-significant results in tests on variables known to be
important.
– Wrong sign associated with a regression coefficient.
– Large pair wise correlation among predictors.
– Even though we know that the regression is significant,
none or most of the variables may be insignificant.
The formal device for detecting multicollinearity

• The formal method of detectining for detecting


multicollinearity is Variance inflation factor (VIF).
• To calculate the VIF for the jth explanatory variable, use the
relationship
VIF  1 /(1  R j )
2

 1 /( RSS j / TSS j )
 TSS j / RSS j
 var(variable j ) / var(residuals )
• The VIF is equal to 1 when Rk2 is equal to zero (no linear
dependency). With large maximum values, multicollinearity
problems exists. (> 10)
Remedies of Multicollinearity
• There may be various factor for the existance
of multicollnarity. The following are the major
remedies for the multiocllinarity
– Model Respecification
– Drop one of the collinear variables
– Transform the highly correlated variables into a
ratio
– Add more data. That mean go out and collect
more data e.g.
Autocorrelation
• Idea: If there is some pattern in how the values of your time
series change from observations to observation, you could use
it to your advantage.
• The correlation between the original time series values and
the corresponding k-lagged values is called autocorrelation of
order k.
• The Autocorrelation Function (ACF) provides the correlation
between the serial correlation coefficients for consecutive
lags.
• Correlograms display graphically the ACF.
• The ACF can be misleading for a series with unstable
variance, so it might first be necessary to transform for a
constant variance before using the ACF.
Cont.
• Another useful method to examine serial dependencies is to
examine the Partial Autocorrelation Function (PACF), an
extension of autocorrelation where the dependence on the
intermediate elements (those within the lag) is removed.
• For time series data, ACF and PACF measure the degree of
relationship between observations k time periods, or lags,
apart. These plots provide valuable information to help you
identify an appropriate ARIMA model.
• In a sense, the partial autocorrelation provides a “cleaner”
picture of serial dependencies for individual lags.
Con.
• An example of an autocorrelated error:
et  bet 1  vt
• Here we have b = 0.8. It means that 80% of the
error in period t-1 is still felt in period t. The
error in period t is comprised of 80% of last
period’s error plus an error that is unique to period
t. This is sometimes called an AR(1) model for
“autoregressive of the first order”
• The autocorrelation coefficient must lie between –
1 and 1:
-1 < b < 1
Anything outside this range is unstable and very
unlikely for economic models
Cont.

• Autocorrelation can be positive or negative:


if b > 0  we say that the error has positive
autocorrelation.
• A graph of the errors shows a tracking pattern:
if b < 0  we say that the error has negative
autocorrelation.
• A graph of the errors shows an oscillating pattern:
• In general b measures the strength of the correlation
between the errors at time t and their values lagged
one period.
• There can be higher orders such as a second order
AR(2) model:
et  b1et 1  b2 et  2  vt
How to Test for Autocorrelation
• We test for autocorrelation similar to how we test for a
heteroskedastic error: estimate the model using least
squares and examine the residuals for a pattern.
1) Visual Inspection: Plot residuals against time. Do they
have a systematic pattern that indicates a tracking pattern
(for positive autocorrelation) or an oscillating pattern (for
negative autocorrelation)?
2) Durbin-Watson Test: This test is based on the residuals
from the least squares regression. The d statistic can be
simplified into an expression involving. The sample
correlation  between the residuals at t and t-1:
How to Correct for Autocorrelation
• The following the two method that can be applied to
correct the auto correlation.
1) It is quite possible that the error in a regression equation
appears to be autocorrelated due to an omitted variable.
Recall that omitted variables “end up” in the error term. If
the omitted variable is correlated over time (which is true
of many economic time-series), then the residuals will
appear to track  Correct the problem by reformulating
the model (include the omitted variable)
2) Generalized Least Squares
Similar to the problem of a heteroskedastic error, we will
take our model that has an autocorrelated error and
transform it into a model that has a well-behaved (serially
uncorrelated) error.
Chapter Five: Panal Data
Regression Analysis
By Dr. Teshome A. (PhD)
Oromia State Univerity
• Typically we have three types of data sets which we
use in economics:
– Time series – This is the most common form of data that
we use and they are quite easily accessible.
– Cross Section – This is data usually observed over
geographic or demographic groups. For example, we can
observe data on the unemployment rate for each of the 11
Ethiopia states.
– Panel Data – This type combines the first two types. Here
we have a cross section, but we observe the cross section
over time. If the same people or states or counties,
sampled in the cross section, are then re-sampled at a
different time we call this a longitudinal data set, which is
a very valuable type of panel data set.
• What are Panel Data?
• A panel dataset contains observations on multiple entities (individuals), where
each entity is observed at two or more points in time.
• A double subscript distinguishes entities (states) and time periods (years) (Xit)
• Panel data with k regressors:
(X1it, X2it,…,Xkit, Yit), i = 1,…,n, t = 1,…,T
n = number of entities (states)
T = number of time periods (years)
• Panel data are a type of longitudinal data, or data collected at different points in
time. Three main types of longitudinal data.

• Panel data are a form of longitudinal data, involving regularly repeated


observations on the same individuals.
– Individuals may be people, households, firms, areas, etc
– Repeat observations may be different time periods or units within clusters (e.g. workers
within firms; siblings within twin pairs)
• Examples:
– Annual unemployment rates of each state over several
years
– Quarterly sales of individual stores over
several quarters
– Wages for the same worker, working at several different
jobs
Benefits from using panel data
• There are two major benefits from using panel data.
– 1) Panel data allows you to get more reliable estimates of the
parameters of a model. There are several possible reasons for
this.
a) Panel data allows you to control for unobservable factors that
vary across units but not over time, and unobservable factors
that vary over time but not across units. This can substantially
reduce estimation bias.
b) There is usually more variation in panel data than in cross-
section or time-series data. The greater the variation in the
explanatory variables, the more precise the estimates.
c) There is usually less multicollinearity among explanatory
variables when using panel data than time-series or cross-
section data alone. This also results in more precise parameter
estimates.
– 2) Panel data allows you to identify and measure effects that cannot be
identified and measured using cross-sectional data or time-series data.
• For example, suppose that your objective is to estimate a
production function to obtain separate estimates of economies of
scale and technological change for a particular industry. If you
have cross-section data, you can obtain an estimate of economies
of scale, but you can’t obtain an estimate of technological change.
If you have time-series data you cannot separate economies of
scale from technological change.
• To attempt to separate economies of scale from technological
change, past time-series studies have assumed constant returns to
scale; however, this is a very dubious procedure. If you have panel
data, you can identify and measure both economies of scale and
technological change.
Why use panel data?
• Repeated observations on individuals allow for
possibility of isolating effects of unobserved
differences between individuals
• We can study dynamics
• The ability to make causal inference is enhanced by
temporal ordering
• Some phenomena are inherently longitudinal (e.g.
poverty persistence; unstable employment)
• Net versus gross change: gross change visible only
from longitudinal data, e.g. decomposition of change in
unemployment rate over time into contributions from
inflows and outflows
Critics of panal data
• Variation between people usually far exceeds variation
over time for an individual
– ⇒ a panel with T waves doesn’t give T times the information
of a cross-section
• Variation over time may not exist for some important
variables or may be inflated by measurement error
• Panel data imposes a fixed timing structure; continuous
time survival analysis may be more informative
• We still need very strong assumptions to draw clear
inferences from panels: sequencing in time does not
necessarily reflect causation
Organizing Panel Data
• The best way to explain panel data organization is to use a simple
example. Suppose that we have a country with five regions, listed as
A, B, C, D, and E.
• We can think of this map above as depicting a country having five
states. Now, suppose that we travel to each state and collect data on
two variables – X and Y. We suspect that Y is determined by X and
therefore we assert there is a stable structure which we can capture
using the regression
Y  o   X  
1

• The next year, we again sample each state to get data


on X and Y.
• We therefore have two years of data on X and Y for
each of the states A, B, C, D, and E.
• This means we have 10 observations on Y and X (i.e., 5
cross sectional units * 2 time periods).
• If we put all the data together and do not make any distinction
between cross section and time series, we can of course run a
regression over all the data using ordinary least squares. This
is called a pooled OLS regression.
• This type of regression is the easiest to run, but is also subject
to many types of errors. We could simply write the data in
observation form like the following (data is hypothetically
created).
• Pooled OLS is often used as a rough and ready means of
analyzing the data. It is a simple and quick benchmark to
which more sophisticated regressions can be compared.
• If we make distinctions between time and cross sectional parts
of the data, which is more sophisticated and more informative,
there are two ways we can organize this data into tables.
pooled panal data
k x y
1 7 1
2 7 1
3 12 2
4 15 3
5 21 3
6 10 2
7 14 3
8 18 4
9 22 4
10 25 5
Panel Regression
• Panel data allows the researcher to consider more general
models than the simple pooled OLS model we discussed earlier.
• In particular, we can now assume that the constant term for
each state (A, B, C, D, and E) differs. We can write this as
Yit   o   X it  ai   it
1
• for i = A, B, C, D, and E and for t = 1 and 2. Each ai is a
separate constant associated with a different state.
• This means that the actual constant term for each state is equal
to . In fact, we can estimate these constant terms, but they are
seldom of much importance and their estimated values are
difficult to judge because there is so little data being used to
estimate them (two observations for each using the above data
set).
• Instead, we are typically more interested in the slope
coefficient, .
Fixed-effects regression model
• Consider an economic relationship that involves a dependent
variable, Y, two observable explanatory variables, X1 and X2,
and one or more unobservable confounding variables.
• You have panel data for Y, X1, and X2. The panel data consists
of N-units and T-time periods, and therefore you have N times
T observations.
• The classical linear regression model without an intercept is
given by
Yit = β1Xit1 + β2Xit2 + μit for i = 1, 2, …, N and t = 1, 2,
…, T
where Yit is the value of Y for the ith unit for the tth time period; Xit1 is the
value of X1 for the ith unit for the tth time period, Xit2 is the value of X2 for
the ith unit for the tth time period, and μit is the error for the ith unit for the
tth time period.
Cont.
• The fixed effects regression model, which is an extension of
the classical linear regression model, is given by
Yit = β1Xit1 + β2Xit2 + νi + εit
• Where μit = νi + εit. The error term for the classical linear
regression model is decomposed into two components. The
component νi represents all unobserved factors that vary
across units but are constant over time. The component εit
represents all unobserved factors that vary across units and
time.
• It is assumed that the net effect on Y of unobservable factors
for the ith unit that are constant over time is a fixed
parameter, designated αi. Therefore, the fixed effects model
can be rewritten as
Yit = β1Xit1 + β2Xit2 + α1 + α2 + … + αN + εit
Cont.
• The unobserved error component νi has been replaced with a
set of fixed parameters, α1 + α2 + … + αN, one parameter for
each of the N units in the sample.
• These parameters are called unobserved effects and represent
unobserved heterogeneity.
• For example, α1 represents the net effect on Y of unobservable
factors that are constant over time for unit one, α 2 for unit two,
…, aN for unit N.
• Therefore, in the fixed-effects model each unit in the sample
has its own intercept. These N intercepts control for the net
effects of all unobservable factors that differ across units but
are constant over time.
Random effects model
• Consider an economic relationship that involves a dependent variable, Y,
and two observable explanatory variables, X 1 and X2. You have panel
data for Y, X1, and X2.
• The panel data consists of N-units and T-time periods, and therefore you
have N times T observations. The random effects model can be written as
Yit = β1Xit1 + β2Xit2 + νi + εit for i = 1, 2, …, N and t = 1, 2, …, T
• where the classical error term is decomposed into two components.
• The component νi represents all unobserved factors that vary across units
but are constant over time.
• The component εit represents all unobserved factors that vary across units
and time. It is assumed that v i is given by
v i = α 0 + ωi for i = 1, 2, …, N
• where the vi are decomposed into two components:
– 1) a deterministic component 0,
– 2) a random component vi. Once again, each of the N units has its own
intercept.
Cont.
• However, in this model the N intercepts are not fixed
parameters; rather they are random variables.
• The deterministic component 0 is interpreted as the
population mean intercept.
• The disturbance ωi is the difference between the population
mean intercept and the intercept for the ith unit.
• It is assumed that the ωi for each unit is drawn from an
independent probability distribution with mean zero and
constant variance; that is,
E(ωi) = 0 Var(ωi) = ω2 Cov(ωi,ωs) = 0
• The N random variables vi are called random effects.
Cont.
• The random effects model can be rewritten equivalently as
Yit = α0 + β1Xit1 + β2Xit2 + μit
• Where μit = ωi + εit. An important assumption underlying the
random effects model is that the error term μ it is not correlated
with any of the explanatory variables.
• Because the error component ωi is in the error term μit for each
unit for each time period, the error term μit has autocorrelation.
The correlation coefficient for the error term for the ith unit
for any two time periods t and s is given by
Corr(μit, μis) = σ2ω / (σ2ω + σ2ε)
• where σ2ω is the variance of ωi, and σ2ε is the variance of εit.
Since this correlation coefficient must be positive the
autocorrelation is positive.
Chapter Six: Binary and
Instrumental Variable Methods
By Dr. Teshome A.
Oromia State University
Introduction
• In the last five chapter we studied about unlimited
dependent variable and single regression analysis.
• In econometrics analysis we my have limited
dependent variables and more than one equations.
• In such circumstances we need to develop or
study the right model formulation which is know
us the binomial and structural analysis.
• Such analysis will solve the practical problems of
the econometrics analysis.
• This chapter mainly explain the binary regression
and Instrumental variable methods.
Binary analysis
• Binary analysis can in many ways be seen to be similar to
ordinary regression analysis.
• It models the relationship between a dependent and one or more
independent variables, and allows us to look at the fit of the
model as well as at the significance of the relationships (between
dependent and independent variables) that we are modeling.
• However, the underlying principle of binary analysis/ regression,
and its statistical calculation, are quite different to ordinary linear
regression.
• While ordinary regression uses ordinary least squares to find a
best fitting line, and comes up with coefficients that predict the
change in the dependent variable for one unit change in the
independent variable, binary regression estimates the
probability of an event occurring.
cont
• There are numerous examples of instances where this
may arise, for example where we want to model:
– Why firms choose to list their shares on the Ethiopia
rather than the Kenya Bond
– Why some stocks pay dividends while others do not
– What factors affect whether countries default on their
sovereign debt
– Why some firms choose to issue new stock to finance
an expansion while others issue bonds
– Why some firms choose to engage in stock splits
while others do not.
Cont.
• In binary analysis we have both dependent and independent
variables.
– Dependent variable: one dependent variable
• Is the mortgage denied or accepted?
• Is there success or failure
• To invest or not to invest
– Independent variables:
• income, wealth, employment status
• other loan, property characteristics
• race of applicant
• There are common types of binary analysis, linear probability
model, probit and logit analysis.
Linear Probability Model(LPD)
• The term “probability” has several definitions: a quantitative
description of the likely occurrence of a particular event.
• Probability is often expressed on a scale from 0 to 100 percent,
but researchers often use a scale of 0 to 1; a rare event has a
probability close to 0 while a very common event has a
probability close to 1.
• Probabilities are used as a tool to support conclusions
regarding both controlled and “natural” experiments.
• A linear probability model is a special case of a binomial
regression model. Here the dependent variable for each
observation takes values which are either 0 or 1. The
probability of observing a 0 or 1 in any one case is treated as
depending on one or more explanatory variables.
Cont.
• The linear probability model (LPM) is by far the simplest way of
dealing with binary dependent variables, and it is based on an
assumption that the probability of an event occurring, Pi , is linearly
related to a set of explanatory variables x2i , x3i , . . . , xki
• The Linear Probability Model: A natural starting point is the
linear regression model with a single repressor:
Yi = 0 + 1Xi + ui
• But:
– What does 1 mean when Y is binary? Is 1 = ?
– What does the line 0 + 1X mean when Y is binary?
– What does the predicted value mean when Y is binary? For
example, what does = 0.26 mean?
cont.
• Advantages:
– simple to estimate and to interpret
– inference is the same as for multiple regression (need
heteroskedasticity-robust standard errors)
• Disadvantages:
– Does it make sense that the probability should be linear in
X?
– Predicted probabilities can be <0 or >1!
• These disadvantages can be solved by using a nonlinear
probability model: probit and logit regression
Logistic Regression
• In logistic regression the outcome variable is binary, and the
purpose of the analysis is to assess the effects of multiple
explanatory variables, which can be numeric and/or
categorical, on the outcome variable.
• The following are the requirements for Logistic Regression
 An outcome variable with two possible categorical
outcomes (1=success; 0=failure).
 A way to estimate the probability P of the outcome
variable.
 A way of linking the outcome variable to the explanatory
variables.
 A way of estimating the coefficients of the regression
equation, as well as their confidence intervals.
 A way to test the goodness of fit of the regression model.
Measuring the Probability of Outcome
• The probability of the outcome is measured by the odds of
occurrence of an event.
• If P is the probability of an event, then (1-P) is the probability
of it not occurring.
Odds of success = P / 1-P
• The model is then expressed as the odds ratio, which is simply
the probability of an event occurring relative to the probability
that it will not occur. Then by taking the natural log of the
odds ratio we produce the Logit (Li), as follows:
pi
Li  ln(  z i   0   1 xi
(1  pi )
Cont.
• The above relationship shows that L is linear in x, the
probabilities (p) are not linear.
• In general the Logit model is estimated using the Maximum
Likelihood approach.
 MLE is a statistical method for estimating the coefficients of
a model.
 The likelihood function (L) measures the probability of
observing the particular set of dependent variable values (p 1,
p2, ..., p n) that occur in the sample:
L = Prob (p1* p2* * * pn)
 The higher the L, the higher the probability of observing the ps
in the sample.
A strategy for a simple logistic regression analysis
• The following are the major strategies for logistic regression
– Identify the model – Have you the appropriate variables for the
analysis, if it is an observational study decide which is the
predictor variable and which is the criterion.
– Review logistic regression assumptions – The validity of the
conclusions from the analysis depends upon the appropriateness
of the model, appropriate data types, and study design ensuring
Independent measures (to maintain the independence of errors
assumption). Whereas in linear regression we assumed a linear
relationship between the predictor and outcome variables now it
is between the logit of the outcome variable and the predictor. In
others words statistical validity needs to be considered.
– Obtain the Logistic regression equation and also graph of
observed values against the logistic curve.
Cont.
• Evaluate the logistic regression equation
– Make possible model fitness test
• Logistic Regression diagnostics to Identify
possible values that need further investigation
or removing from analysis (data cleaning).
• Make the right or proper interpretation of the
estimated coefficient
Instrumental variable(IV) method
• What are instrumental variables (IV) methods? Most
widely known as a solution to endogenous regressors:
explanatory variables correlated with the regression
error term, IV methods provide a way to nonetheless
obtain consistent parameter estimates.
• An Instrumental Variable is a variable that is
correlated with X but uncorrelated with e.
• If Zi is an instrumental variable:
1. E( Zi Xi ) ≠ 0
2. E( Zi ei ) = 0
Cont.
• When we have just enough instruments for consistent
estimation, we say the regression equation is exactly
identified.
• When we have more than enough instruments, the regression
equation is over identified. When we do not have enough
instruments, the equation is under identified (and
inconsistent).
• When the regression is under identified, then we do not have
a consistent estimator. When the regression is exactly
identified, then we simply use Instrumental Variables Least
Squares.
• When the regression is over identified, we have more
instruments than we need.
Cont.
• We could then use the newly constructed instrumental
variable to perform IVLS.
• In practice, however, econometricians use a slightly
simpler procedure.
• They use the new instruments to replace the explanators
in OLS.
• This strategy requires a two-stage process, called Two-
Stage Least Squares (2SLS or TSLS).
• In stage one, we construct a new instrument that is a
linear combination of the original instruments. In stage
two, we replace the troublesome variables with their fitted
values from the first stage.
Two Stages least Square
• The 2SLS estimator is a special type of IV estimator. It
involves two successive applications of the OLS estimator,
and is given by the following two stage procedure.
• Regress each right-hand side endogenous variable in the
equation to be estimated on all exogenous variables in the
simultaneous equation model using the OLS estimator.
Calculate the fitted values for each of these endogenous
variables.
• In the equation to be estimated, replace each endogenous
right-hand side variable by its fitted value variable. Estimate
the equation using the OLS estimator.
• The 2SLS estimator is the most popular single equation
estimator, and one of the most often used estimators in
financial economics.
Features of 2SLS
• It can be directly applied to an equation in a system,
without needing to take into account any other
equations.
• It can be used for both exactly identified and
overidentified equations.
• Easy to use, the only information required is what the
exogenous variables are.
• Given that the S.E can be calculated, t-statistics can
also be used.
• It is a large-sample technique.
Example:
• Policy makers are greatly interested in the effects of tax rates
on labor force participation (and other taxpayer behaviors).
• They would like to run regressions with an individual’s tax
rate as an explanator.
• However, an individual has some choice over his/her tax rate
• Taxpayers who are close to the income threshold for a new
tax bracket can choose to limit their taxable income.
• For example, they might take more of their pay in the form of
untaxed benefits or deferred 401(k) compensation rather than
pay higher taxes on the extra compensation.
• The ability and desire to adjust taxable income may well be
correlated with e.
Cont.
• When the government changes the
tax rates, the individual’s new tax rate is determined by
two elements:
1. The change in tax rates (which is uncorrelated with
anything else about the individual), and
2. The individual’s decisions about how to respond to the
tax change (which could well be correlated with e).
• Public finance economists construct an instrumental
variable that captures only the change in tax rates, not
the change in behavior.
• They use the new tax tables to look up the tax rate
individuals would face IF they did NOT change their
behavior from before the tax change.
Thank You

You might also like