Professional Documents
Culture Documents
All Economterics Note 1-66
All Economterics Note 1-66
Introduction
By Tefera M.
Advanced Econometrics, Oromia
State University
Introduction
• What is econometrics
– Simply stated, econometrics means economic
measurement.
– Econometrics may be defined as the social science in
which the tools of economic theory, mathematics, and
statistical inference are applied to the analysis of
economic phenomena.
– Econometrics, the result of a certain outlook on the
role of economics, consists of the application of
mathematical statistics to economic data to lend
empirical support to the models constructed by
mathematical economics and to obtain numerical
results.
Cont.
• Econometrics has to be distinguished from
mathematical economics and from statistical
economics.
• It is, however, closely related to both, and utilizes
results achieved m these fields
• Econometrics is based upon the development of
statistical methods for estimating economic
relationships, testing economic theories, and
evaluating and implementing government and
business policy
cont.
• Econometrics is the application of mathematics, statistical
methods, and, more recently, computer science, to economic
data and is described as the branch of economics that aims to
give empirical content to economic relations.
Collection of Data
Model Estimation
No Yes
• Continuous data can take on any value and are not confined to take
specific numbers.
• Their values are limited only by precision.
– For example, the rental yield on a property could be 6.2%,
6.24%, or 6.238%.
• On the other hand, discrete data can only take on certain values,
which are usually integers
– For instance, the number of people in a particular underground
carriage or the number of shares traded during a day.
• They do not necessarily have to be integers (whole numbers)
though, and are often defined to be count numbers.
– For example, until recently when they became ‘decimalised’,
many financial asset prices were quoted to the nearest 1/16 or
1/32 of a dollar.
Review of elementary statistics
• Statistics is the science of planning studies and experiments,
obtaining data, and then organizing, summarizing, presenting,
analyzing, interpreting, and drawing conclusions based on the data
• In other ward there are Two Meanings: Actual numbers and • In
other ward there are Two Meanings: Actual numbers and Methods
of analysis.
– Actual numbers: numerical measurements determined by a set
of data
– Methods of analysis: a collection of methods for planning
experiments, obtaining data, and then analyzing, interpreting,
and drawing conclusions based on the data
Statistics, Parameter and Statistic
• Statistics is a way to get information from data
– Data: Facts, especially numerical facts, collected together
for reference or information.
– Information: Knowledge communicated concerning some
particular fact.
Parameter
• a numerical measurement describing some characteristic of a
population.
Statistic
• a numerical measurement describing some characteristic of a
sample.
Statistical Description of Data
• Statistics describes a numeric set of data by its
– Center
– Variability
– Shape
– Relation or correlation
• Statistics describes a categorical set of data
by frequency, percentage or proportion of
each category
Descriptive and Inferential Statistics
• Descriptive statistics are methods for organizing and
summarizing data. • For example, tables or graphs are used to
organize data, and descriptive values such as the average score
are used to summarize data.
• A descriptive value for a population is called a parameter and a
descriptive value for a sample is called a statistic. A descriptive
value for a sample is called a statistic.
• Inferential statistics are methods for using sample data to make
general conclusions (inferences) about populations.
• Because a sample is typically only a part of the whole
population, sample data provide only limited information about
the population. As a result, sample statistics are generally
imperfect representatives of the corresponding population
parameters.
Levels of Data Measurement
26
Measure of variables
• The important statistical measures that are used to summaries
the survey/research data are:
– measures of central tendency or statistical averages;
– measures of dispersion;
– measures of asymmetry (skewness);
– measures of relationship; and
– other measures.
Measures of central tendency or
statistical averages
Mean
• The first measure that we examine is the mean.
• The mean is the arithmetic average for a set of data.
• Recall that to find the arithmetic average for a set of values we
add up all the values and then divide that total by how many
values there are. Sample Mean is equal to
x
x
n
• A manager of a local restaurant is interested in the number of
people who eat there on Fridays. Here are the totals for nine
randomly selected Fridays:
712 626 600 596 655 682 642 532 526
• The is the value in a data set that occurs the most often. If no
such value exists, we say that the data set has no mode.
• If two such values exist, we say the data set is bimodal. If
three such values exist, we say the data set is tri-modal.
• Find the mode for the following data.
25 46 34 45 37 36 40 30 29 37 44 56 50 47 23
40 30 27 38 47 58 22 29 56 40 46 38 19 49 50
Measures of dispersion
• In addition to measures of central tendency, it is desirable to
have numerical values to describe the spread or dispersion of a
data set.
• Measures that describe the spread of a data set are called
measures of dispersion.
• Range, variance, and standard deviation are the major methods
of measuring dispersion.
• The range for a data set is equal to the maximum value in the
data set minus the minimum value in the data set.
• It is clear that the range is reflective of the spread in the data
set since the difference between the largest and the smallest
value is directly related to the spread in the data.
Cont.
• The variance and the standard deviation of a data set measures
the spread of the data about the mean of the data set. The
variance of a sample of size n is represented by s2 and is given
by
x x
2
s
2
n 1
• Compute the variance for the following data.
s.n x
1 5
2 10
3 15
4 7
5 3
Cont.
• The square root of the variance is called the standard deviation
and the standard deviation is measured in the same units as the
variable.
s s 2
Measures of asymmetry (skewness)
• SKEWNESS describes the degree to which the data deviates
from symmetry.
• When the distribution of the data is not symmetrical, it is said
to be asymmetrical or skewed
• There are three types of skewness(Normal, negative and
posative skewness)
– Symmetrical/Normal Distribution
• Bell shaped distribution
• The mean, median and mode are all located at one
point.
Cont.
– Positively Skewed Distribution
• Observations are mostly concentrated towards the
smaller values and there are some extremely high
values.
• Also called skewed to the right distribution
Cont.
• Negatively Skewed Distribution
– Observations are mostly concentrated towards the larger
values and there are some extremely low values.
– Also called skewed to the left distribution.
Skewness and Central Tendency
• We have already discussed how each measure is affected by
outliers or skewed distribution.
• Let’s consider this information further. In a positively skewed
distribution the outliers will be pulling the mean down the scale a
great deal.
• The median might be slightly lower due to the outlier, but the mode
will be unaffected. Thus, with a negatively skewed distribution the
mean is numerically lower than the median or mode.
• The opposite is true for positively skewed distributions. Here the
outliers are on the high end of the scale and will pull the mean in
that direction a great deal. The median might
• be slightly affected as well, but not the mode. Thus, with a
positively skewed distribution the mean is numerically higher than
the median or the mode.
Figure 1 : Skewness and Central Tendency
Measures of relationship
• The common known method of measures of relation are;
– Covariance
– Correlation
• Covariance is a measure of the degree to which two variable
are linearly related. The covariance can be either positive or
negative implying a direct or inverse relationship respectively.
• Correlation also shows how two random variables are related
to each other. More specifically, it shows the strength of linear
relationship. If the correlation is +1, two random variables have
perfect positive linear relationship. If it is –1, the two random
variables have perfect negative linear relationship. As it
becomes close to zero, the linearity in the relationship weakens.
Chapter Two: Introduction to the
Simple OLS Regression
By Teshome A.(PhD)
Unity University
What is a regression model?
• Regression analysis is almost certainly the most important
tool at the econometrician’s disposal. But what is regression
analysis? In very general terms, regression is concerned
with describing and evaluating the relationship between a
given variable and one or more other variables.
• More specifically, regression is an attempt to explain
movements in a variable by reference to movements in one
or more other variables. To make this more concrete, denote
the variable whose movements the regression seeks to
explain by y and the variables which are used to explain
those variations by x1, x2, . . . , xk .
• Hence, in this relatively simple setup, it would be said that
variations in k variables (the xs) cause changes in some
other variable, y.
Cont.
• In other ward regression is Relation between
variables where changes in some variables may
“explain” or possibly “cause” changes in other
variables.
• Explanatory variables are termed the independent
variables and the variables to be explained are termed
the dependent variables.
• Regression model estimates the nature of the
relationship between the independent and dependent
variables.
– Change in dependent variables that results from changes in
independent variables, ie. size of the relationship.
– Strength of the relationship.
– Statistical significance of the relationship.
Cont.
• Any straight line can be represented by an equation of the
form Y = b2X + b1, where b and a are constants.
• The value of b is called the slope constant and determines the
direction and degree to which the line is tilted.
• The value of a is called the Y-intercept and determines the
point where the line crosses the Y-axis.
• The ability of the regression equation to accurately predict the
Y values is measured by first computing the proportion of the
Y-score variability that is predicted by the regression equation
and the proportion that is not predicted.
43
Simple Linear Regression
b1 = slope
= ∆y/ ∆x
b0 (y intercept)
Yi 0 1X i i
Dependent Independent
(Response) (Explanatory) Variable
Variable (e.g., Years s. serocon.)
(e.g., CD+ c.)
The Differences between and b
About and About bo and b1
2
û 1 û 2
2 û 3
2
û 2
n
n
û
i 1
2
i
Cont.
• where n is the total number of observations and ∑ is the
summation operator. The above equation is known as the sum
of squared residuals (sum of squared errors) and denoted SSE.
Using the definitions of and , the SSE becomes
n n n n
SSE û ( y i ŷ i ) [ y i (â b̂x i )] ( y i â b̂x i ) 2
2
i
2 2
i 1 i 1 i 1 i 1
i 1
Y Y2 0 1X 2 2
^44
^22
^11 ^33
Yi 0 1X i
X
EPI 809/Spring 2008 55
Cont.
n n
SSR [2 ( y i â b̂x i ) (1)] 2( y i â b̂x i ) 0, and
â i 1 i 1
SSR n n
b̂ [2 ( y i â b̂x i ) ( x i )] 2 x i ( y i â b̂x i ) 0.
i 1 i 1
â y b̂x
n n n
n x i yi x i yi
b̂ i 1
n
i 1
n
i 1
n x i2 ( x i ) 2
i 1 i 1
Assumptions of the OLS Estimator
• In this sub section, five assumptions that necessary to derive
and use the OLS estimator are presented.
• The next section will summarize the need for each assumption
in the derivation and use of the OLS estimator. You will need
to know and understand these five assumptions and their use.
Assumption one - Linear in Parameters
• This assumption has been discussed in both the simple linear
and multiple regression derivations and presented above as a
trait. Specifically, the assumption is
– the dependent variable y can be calculated as a linear function of a
specific set of independent variables plus an error term.
Cont.
• The regression model:
– A) is linear
• It can be written as
Yi 0 1 X 1i i
• This doesn’t mean that the theory must be linear
• For example… suppose we believe that CEO salary is related to
the firm’s sales
• We might believe the model is:
log(salary) i 0 1 log(salesi ) i
Cont.
• Assumption two - Random Sample of n Observations
• This assumption is composed of three related sub-assumptions.
– Assumption A. The sample consists of n-paired observations that are
drawn randomly from the population.
– Assumption B. The number of observations is greater than the number of
parameters to be estimated, usually written n > k. As discussed earlier, if
n = k, the number of observations (equations) will equal the number of
unknowns. In this case, OLS is not necessary, algebraic procedures can be
used to derive the estimates. If n < k, the number of observations is less
than the number of unknowns. In this case, neither algebra nor OLS
provide unique estimates.
– Assumption C. The independent variables (x’s) are nonstochastic, whose
values are fixed. This assumption means there is a unilateral causal
relationship between dependent variable, y, and the independent variables,
x’s. Variations in the x’s cause variations (changes) in the y’s; the x’s
cause y.
Cont.
• Assumption three– Zero Conditional Mean
• The mean of the error terms has an expected value of zero
given values for the independent variables.
• In mathematical notation, this assumption is correctly written
as E (U | X ) 0 A shorthand notation is often employed and will
be used in this class of the following .E (U ) 0 Here, E is the
expectation operator, U the matrix of error terms, and X the
matrix of independent variables.
Cont.
• Assumption Four – No Perfect Collinearity
• The assumption of no perfect collinearity states that there is
no exact linear relationship among the independent variables.
This assumption implies two aspects of the data on the
independent variables. First, none of the independent
variables, other than the variable associated with the intercept
term (recall x1=1 regardless of the observation), can be a
constant. Variation in the x’s is necessary. In general, the
more variation in the independent variables the better the OLS
estimates well be in terms of identifying the impacts of the
different independent variables on the dependent variable.
Cont.
• Important!!
• All explanatory variables are uncorrelated with the error term
• E(εi|X1i,X2i,…, XKi,)=0
• Explanatory variables are determined outside of the model (They
are exogenous)
• What happens if this assumption is violated?
• Suppose we have the model,
• Yi =β0+ β1Xi+εi
• Suppose Xi and εi are positively correlated
• When Xi is large, εi tends to be large as well.
Cont.
Yi β0 β1 X 1i β2 X 2i βk X ki ε
• The coefficients of the multiple regression model
are estimated using sample data with k
independent variables
Estimated Estimated
(or predicted) Estimated slope coefficients
intercept
value of Y
Ŷi b0 b1 X 1i b2 X 2i bk X ki
• Interpretation of the Slopes: (referred to as a Net
Regression Coefficient)
– b1=The change in the mean of Y per unit change in X1,
taking into account the effect of X2 (or net of X2)
– b0 Y intercept. It is the same as simple regression.
Assumptions of multiple regression
• The following are the major assumption of multiple regression
– Independence: the scores of any particular subject are
independent of the scores of all other subjects
– Normality: in the population, the scores on the dependent
variable are normally distributed for each of the possible
combinations of the level of the X variables; each of the
variables is normally distributed
– Homoscedasticity: in the population, the variances of the
dependent variable for each of the possible combinations of
the levels of the X variables are equal.
– Linearity: In the population, the relation between the
dependent variable and the independent variable is linear
when all the other independent variables are held constant.
Simple vs. Multiple Regression
y
y
X1
X
M
Multiple
ultiple
Regression
Regression
M
Models
odels Non-
Non-
Linear
Linear Linear
Linear
Dum
Dummymy Inter-
Inter-
Linear
Linear action
Variable
Variable action
Poly-
Poly- Square
Square Log
Log Reciprocal
Reciprocal Exponential
Exponential
Nom
Nomial
ial Root
Root
Interpreting the Regression Coefficients
• The regression coefficients are interpreted essentially the same
in multiple regression as they are in simple regression, with
one caveat.
• The slope of an independent variable in multiple regression is
usually interpreted as the marginal (or isolated) effect of a unit
change in the variable upon the mean value of Y when “the
values of all of the other independent variables are held
constant”.
• Let us assume that the sale of particular product affected by
advertising and bonus payments. As you can see in the next
table the change in the bonus and advertising spending affect
the total sale of the particular product.
Dependent variable: Sales
----------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic
----------------------------------------------------------------
CONSTANT -515.073 190.759 -2.70013
Ad 2.47216 0.275644 8.96869
Bonus 1.85284 0.717485 2.5824
----------------------------------------------------------------
b0 estimates the expected annual sales for a territory if $0.00 is spent on
advertising and bonuses. Because these values are outside the range of
values for Ad and Bonus observed, and upon which the estimated
regression equation is based, the value of b0 has no practicle
interpretation. Put more concisely, an interpretation of b 0 is not supported
by the data. This will often, but not always, be the case in multiple
regression.
b1: Expected (mean) sales increases by about $2,472 for every $100
increase in the amount spent on advertizing, holding the amount of
bonuses paid constant.
b2: Sales increases by $1,853, on average, for every $100 increase in
bonuses, for a given amount spent on advertizing
Major Types Of Multiple Regression
• There are a number of different types of multiple regression
analyses that you can use, depending on the nature of the
question you wish to address.
• The three main types of multiple regression analyses are:
– standard or simultaneous
– hierarchical or sequential
– stepwise.
Cont.
• Standard multiple regression: In standard multiple
regression, all the independent (or predictor) variables are
entered into the equation simultaneously. Each independent
variable is evaluated in terms of its predictive power, over and
above that offered by all the other independent variables.
• This is the most commonly used multiple regression analysis.
You would use this approach if you had a set of variables (e.g.
various personality scales) and wanted to know how much
variance in a dependent variable they were able to explain as a
group or block.
• This approach would also tell you how much unique variance
in the dependent variable each of the independent variables
explained.
Cont.
• Hierarchical multiple regression
• In hierarchical regression (also called sequential regression), the
independent variables are entered into the equation in the order
specified by the researcher based on theoretical grounds.
Variables or sets of variables are entered in steps (or blocks),
with each independent variable being assessed in terms of what it
adds to the prediction of the dependent variable after the previous
variables have been controlled for.
• For example, if you wanted to know how well business profit
afftected by market interest rate , after the effect of intersst rate
is controlled for, you would enter the prices of good and srvices.
• Once all sets of variables are entered, the overall model is
assessed in terms of its ability to predict the dependent measure.
Cont.
• Stepwise multiple regression
• In stepwise regression, the researcher provides a list of
independent variables and then allows the program to select
which variables it will enter and in which order they go into
the equation, based on a set of statistical criteria.
• There are three different versions of this approach: forward
selection, backward deletion and stepwise regression. There
are a number of problems with these approaches and some
controversy in the literature concerning their use (and abuse).
• It is important that you understand what is involved, how to
choose the appropriate variables and how to interpret the
output that you receive.
Derivation Multiple Regression
Coefficients
• The intention of this discussion is to press
home two basic points.
– First, the principles behind the derivation of the
regression coefficients are the same for multiple
regression as for simple regression.
– Second, the expressions, however, are different,
and so you should not try to use expressions
derived for simple regression in a multiple
regression context.
Cont.
• The definition and interpretation of the sums of squares in
multiple regression is similar to that in simple regression.
n
• Total Sum of Squares, SST y y , is a measure of the total
i 1
i
2
variation of Y
• Regression Sum of Squares,SSR yˆ y i , measures the
2
i i
91
Derivation of Normal Equations in a Multiple
Regression Analysis
S
2 Yi 1 2 X 1,i 3 X 2,i 1 0 and
1
S
2 Yi 1 2 X 1,i 3 X 2,i X 1,i 0
2
S
2 Yi 1 2 X 1,i 3 X 2,i X 2,i 0
3
Thus normal equations are
Yi N1 2 X1,i 3 X 2,i
i i i
(2)
1,i i 1 1,i 2 1,i 3 X1,i X 2,i
X
i
Y X
i
X 2
i i
(3)
i i
(4)
92
ˆ1 Y ˆ2 X 1 ˆ3 X 2 4
x x
1,i 2 ,i x 2 ,i yi x1,i yi x 22 ,i
ˆ2 i i
2
i i
5
x1,i x2,i x 22 ,i x12,i
i i i
R r
2 2
YX 1
r 2
YX 2 X 1 0 rYX2 2 X 1 1
1 r 2
YX 1
Unpredictable
Variation
Proportion of Predictable and
Unpredictable Variation
(1-R2) = Unpredictable
(unexplained)
variation in Y
Y
X1
R2 = Predictable
X2
(explained)
variation in Y
Standard Error of Estimate
• Measures the standard deviation of the
residuals about the regression plane, and thus
specifies the amount of error incurred when
the least squares regression equation is used to
predict values of the dependent variable.
• The standard error of estimate is computed by
using the following equation:
SSE
se
n k 1
Testing significance of multiple regression
• In this section we show how to conduct significance tests for
multiple regression models.
• In multiple regression, the t test and the F test are the common
test of the test of the significance of multiple regression. But
both t and F test have different purposes.
• The F test is used to determine whether there exists a
significant relationship between the dependent variable and
the entire set of independent variables in the model; thus the F
test is a test of the overall significance of the regression.
• If the F test shows that the regression has overall significance,
the t test is then used to determine whether each of the
individual independent variables is significant. A separate t
test is used for each of the independent variables; thus the t
test is a test for individual significance.
Various Significance Tests
• The motivation behind the F distribution is where we have
independent samples of 2 variables each drawn from
normal distributions
• Testing R2
– Test R2 through an F test
– Test of competing models (difference between R2)
through an F test of difference of R2s
• Testing b
– Test of each partial regression coefficient (b) by t-tests
– Comparison of partial regression coefficients with each
other - t-test of difference between standardized
partial regression coefficients ()
Cont.
• The procedure for estimation is as follows:
1. Estimate the unrestricted version of the model
2. Estimate the restricted version of the model
2
3. Collect for the unrestricted
ê model and 2*
F
2* 2
eˆ eˆ q ê
F
1 R 1 R q R R q
2* 2
2 2*
1 R n k
2
1 R n k
2
103
Relationship between R & F 2
Yt = 1 + 2 Xt2 + … + k Xtk + t
Var(t) = E(t2) = t2 for t = 1, 2, …, n
Reasons for hetroscadasticity
• There are several reasons why the variances of errer terms(ui)
may be variable, some of which are as follows.
– Following the error-learning models, as people learn, their
errors of behavior become smaller over time or the number
of errors becomes more consistent.
– As data collecting techniques improve, the variance is
likely to decrease
– Heteroscedasticity can also arise as a result of the presence
of outliers.
– Another source of heteroscedasticity is skewness in the
distribution of one or more repressors included in the
model.
Consequences of heteroscedasticity
1 /( RSS j / TSS j )
TSS j / RSS j
var(variable j ) / var(residuals )
• The VIF is equal to 1 when Rk2 is equal to zero (no linear
dependency). With large maximum values, multicollinearity
problems exists. (> 10)
Remedies of Multicollinearity
• There may be various factor for the existance
of multicollnarity. The following are the major
remedies for the multiocllinarity
– Model Respecification
– Drop one of the collinear variables
– Transform the highly correlated variables into a
ratio
– Add more data. That mean go out and collect
more data e.g.
Autocorrelation
• Idea: If there is some pattern in how the values of your time
series change from observations to observation, you could use
it to your advantage.
• The correlation between the original time series values and
the corresponding k-lagged values is called autocorrelation of
order k.
• The Autocorrelation Function (ACF) provides the correlation
between the serial correlation coefficients for consecutive
lags.
• Correlograms display graphically the ACF.
• The ACF can be misleading for a series with unstable
variance, so it might first be necessary to transform for a
constant variance before using the ACF.
Cont.
• Another useful method to examine serial dependencies is to
examine the Partial Autocorrelation Function (PACF), an
extension of autocorrelation where the dependence on the
intermediate elements (those within the lag) is removed.
• For time series data, ACF and PACF measure the degree of
relationship between observations k time periods, or lags,
apart. These plots provide valuable information to help you
identify an appropriate ARIMA model.
• In a sense, the partial autocorrelation provides a “cleaner”
picture of serial dependencies for individual lags.
Con.
• An example of an autocorrelated error:
et bet 1 vt
• Here we have b = 0.8. It means that 80% of the
error in period t-1 is still felt in period t. The
error in period t is comprised of 80% of last
period’s error plus an error that is unique to period
t. This is sometimes called an AR(1) model for
“autoregressive of the first order”
• The autocorrelation coefficient must lie between –
1 and 1:
-1 < b < 1
Anything outside this range is unstable and very
unlikely for economic models
Cont.