Download as pdf or txt
Download as pdf or txt
You are on page 1of 50


Regression analysis: Further details

(We Focus on Model with Two Explanatory Variables)

12/28/2022 Prepared by: Hulunayen Y.

3.1Multivariate Case of CLRM
 In simple regression we study the relationship between a
dependent variable and a single explanatory (independent
variable); assume that a dependent variable is influenced by only
one explanatory variable.
 However, many economic variables are influenced by several
factors or variables.
 For instance;
 In decision to investment studies we study the relationship
between quantity invested (or either to invest or not) and interest
rate, share price , exchange rate, etc.
 The demand for a commodity is dependent on price of the same
commodity, price of other competing or complementary goods,
income of the consumer, etc.
12/28/2022 Prepared by: Hulunayen Y.
 Hence the two variable model is often inadequate in practical
 Therefore, we need to discuss multiple regression models.
 The multiple linear regression is entirely concerned with the
relationship between a dependent variable (Y) and two or more
explanatory variables (X1, X2, …, and Xn).

Why do we need multiple regression?

1. One of the motivation for multiple regression is the omitted
variable bias in the simple regression analysis.

 It is the primary drawback of the simple regression but multiple

regression allows us to explicitly control for many other factors
which simultaneously affect the dependent variable.
12/28/2022 Prepared by: Hulunayen Y.
Example: wages vs. education
 Imagine we want to measure the (causal) effect of an additional
year of education on a person’s wage.

 If we want to the model: wage = β0+ β1educ + u and interpret

β1 as the ceteris paribus effect of educ on wage, we have to
assume that educ and u are uncorrelated.

 Consider a different model now: wage= β0+ β1educ + β2exper +

u, where exper is a person’s working experience (in years).

 Since the equation contains experience explicitly, we will be

able to measure the effect of education on wage, holding
experience fixed.
12/28/2022 Prepared by: Hulunayen Y.
2. Multiple regression analysis is also useful for generalizing
functional relationships between variables.
Simple Régression vs. Multiple Régression

 Most of the properties of the simple regression model directly

extend to the multiple regression case.
 We derived many of the formulas for the simple regression
model; however, with multiple variables, formulas can get
difficult when explanatory variables more than two.
 As far as the interpretation of the model is concerned, there’s a
new important fact: the coefficient βj captures the effect of jth
explanatory variable, holding all the remaining explanatory
variables fixed.

12/28/2022 Prepared by: Hulunayen Y.

 Multiple regression analysis is an extension of simple regression
analysis to cover cases in which the dependent variable is
hypothesized to depend on more than one explanatory variable.
 Much of the analysis will be a straightforward extension of the
simple regression model.

 In multiple linear regression, we have one dependent variable Y,

and k number explanatory variables.

 The relationship between a dependent & two/more independent

variables is linear in parameters, and may not be linear in

12/28/2022 Prepared by: Hulunayen Y.

12/28/2022 Prepared by: Hulunayen Y.
 What changes as we move from simple to multiple
 Potentially more explanatory power with more variables;

 The ability to control for other variables; (and the interaction of

various explanatory variables: correlations and

 Harder to visualize drawing a line through three or more (n)-

dimensional space.

 The R2 is no longer simply the square of the correlation

coefficient between Y and X.

12/28/2022 Prepared by: Hulunayen Y.

 Slope ( ): ceteris paribus, Y changes by  j units for every 1
unit change in X , on average.

 Y-Intercept ( 0 ): the average value of Y when all X s are
zero. (may not be meaningful all the time)
 Thus, the definition of MLR includes polynomial regression.
Yi   0  1 X 1i   2 X 2i   3 X 12i   4 X 1i X 2i   i

12/28/2022 Prepared by: Hulunayen Y.

3.1.1 Assumptions of the Multiple Linear Regression
 In order to specify our multiple linear regression model and
proceed our analysis with regard to this model, some assumptions
are compulsory.
 But these assumptions are the same as in the single explanatory
variable model developed earlier except the assumption of no
perfect multicollinearity. These assumptions are:

12/28/2022 Prepared by: Hulunayen Y.

We can’t exclusively list all the assumptions but the
above assumptions are some of the basic
assumptions that enable us to proceed our analysis.

12/28/2022 Prepared by: Hulunayen Y.

3.1.2 Model With Two Explanatory Variables
 In order to understand the nature of multiple regression model
easily, we start our analysis with the case of two explanatory
variables, then extend this to the case of k-explanatory variables.

 Estimation of parameters of two-explanatory

variables model

12/28/2022 Prepared by: Hulunayen Y.

 Since the population regression equation is unknown to any
investigator, it has to be estimated from sample data.
 Let us suppose that the sample data has been used to estimate
the population regression equation.
 We leave the method of estimation unspecified for the present
and merely assume that the equation has been estimated by
sample regression equation, which we write as:

12/28/2022 Prepared by: Hulunayen Y.

 Given sample observation on Y, X1,, & X2, we estimate the
model using method of least square (OLS

12/28/2022 Prepared by: Hulunayen Y.

12/28/2022 Prepared by: Hulunayen Y.
12/28/2022 Prepared by: Hulunayen Y.
12/28/2022 Prepared by: Hulunayen Y.
12/28/2022 Prepared by: Hulunayen Y.
3.2 Global hypothesis test (F and r2)
 We used the t test to test single hypotheses ,i.e. Hypotheses
involving only one coefficient.

 But what if we want to test more than one Coefficients


 We do using F-test.

 F-test it is used for to test overall significance of a


12/28/2022 Prepared by: Hulunayen Y.

 Coefficient of determination (r2): is a measure of the proportion
of variation in the dependent variable that is explained by the
variation of all explanatory variables included in the model.
 In MLRM the same measure is relevant, and the same procedure
formulas are valid.

12/28/2022 Prepared by: Hulunayen Y.

12/28/2022 Prepared by: Hulunayen Y.
12/28/2022 Prepared by: Hulunayen Y.
12/28/2022 Prepared by: Hulunayen Y.
Test of Overall Significance (F-test)
 F-test /joint test is used to test the relevance of all the included
explanatory variables.
 Now consider the following:

12/28/2022 Prepared by: Hulunayen Y.

12/28/2022 Prepared by: Hulunayen Y.
Decision Rule:
Decision: If F > F α(k−1, n−k) , reject H0; otherwise you may
accept H0.(Fcal > F-tab).
where F α(k−1,n−k) is the critical F value at the α level of
significance and (k − 1) numerator df and (n − k) denominator df.

12/28/2022 Prepared by: Hulunayen Y.

3.3 Selection of models
 One of the assumptions of the classical linear regression model
(CLRM), is that the regression model used in the analysis is
“correctly” specified:
 If the model is not “correctly” specified, we encounter the
problem of model specification error or model specification
Basic questions related to model selection
 what are the criteria in choosing a model for empirical
 What types of model specification errors is one likely to
encounter in practice?
 What are the consequences of specification errors?
12/28/2022 Prepared by: Hulunayen Y.
 How does one detect specification errors? In other
words, what are some of the diagnostic tools that one
can use?
 Having detected specification errors, what remedies can
one adopt?
 Model Selection Criteria
 Model chosen for empirical analysis should satisfy the
following criteria
 Be data admissible; that is, predictions made from the
model must be logically possible.
 Be consistent with theory; that is, it must make good
economic sense.
12/28/2022 Prepared by: Hulunayen Y.
 Exhibit parameter constancy; that is, the values of
the parameters should be stable. Otherwise,
forecasting will be difficulty.
 Exhibit data coherency; that is, the residuals
estimated from the model must be purely random
(technically, white noise).
 Be encompassing; that is, the model should
encompass or include all the rival models in the sense
that it is capable of explaining their results.

 In short, other models cannot be an improvement over

the chosen model.

12/28/2022 Prepared by: Hulunayen Y.

3.4 Types of Specification Errors

 In developing an empirical model, one is likely

to commit one or more of the following
specification errors:
i. Omission of a relevant variable(s)
ii. Inclusion of an unnecessary variable(s)
iii. Adopting the wrong functional form
iv. Errors of measurement

12/28/2022 Prepared by: Hulunayen Y.

A. Omission of relevant variables
 If the left-out, or omitted, is correlated with the included
variable, the correlation coefficient between the two variables is
nonzero, the estimators are biased as well as inconsistent.
 Even if the two variables are not correlated, the intercept
parameter is biased, although the slope parameter is now
 The disturbance variance is incorrectly estimated.
 In consequence, the usual confidence interval and hypothesis-
testing procedures are likely to give misleading conclusions
about the statistical significance of the estimated parameters.
 There is asymmetry in the two types of specification biases.

12/28/2022 Prepared by: Hulunayen Y.

B. Inclusion of irrelevant variables
 Including an irrelevant variable in the model;
 the model still gives us unbiased and consistent estimates of
the coefficients in the true model, the error variance is
correctly estimated, and the conventional hypothesis-testing
methods are still valid.

 The only penalty we pay for the inclusion of the

superfluous variable is that the estimated variances
of the coefficients are larger, and as a result our
probability inferences about the parameters are less
12/28/2022 Prepared by: Hulunayen Y.
3.5 Functional Formsof Regression Models
 Commonly used regression models that may be
nonlinear in the variables but are linear in the
parameters or that can be made so by suitable
transformations of the variables.
1. Linear model: Y = β1 + β2X
2. Log model: lnY = β1 + β2 ln X
3. Semi-log model(lin-log or log-lin): Y = β1 + β2
ln X and lnY = β1 + β2 X
4. Reciprocal model: Y = β1 + β2(1/X)
12/28/2022 Prepared by: Hulunayen Y.
Double-log model : ln 𝐸𝑋𝐷𝑈𝑅𝑡 = −7.5417 + 1.6266 ln 𝑃𝐶𝐸𝑋
𝑠𝑒 = (0.716) (0.080)
𝑡 = (−10.53) (20.3) 𝑟2 = 0.9695
 Interprtation:total personal expenditure goes up by 1 percent, on
average, the expenditure on durable goods goes up by about 1.63
Lin-log model
𝐹𝑜𝑜𝑑𝐸𝑥𝑝𝑖 = −1283.912 + 257.2700 ln 𝑇𝑜𝑡𝑎𝑙𝐸𝑥𝑝𝑖
𝑠𝑒 ( ?? ) ( ?? )
𝑡 = −4.3848 5.6625 𝑟 2 = 0.3769
 Interprtation: an increase in the total food expenditure of 1
percent, on average, leads to about 2.57 birr increase in the
expenditure on food.

12/28/2022 Prepared by: Hulunayen Y.

Log-lin model
ln 𝐸𝑋𝑆𝑡 = 8.3226 + 0.00705𝑡
𝑠𝑒 = (0.0016) (0.00018) 𝑟2 = 0.9919
𝑡 = 5201.625 39.1667
Where, EXS is expenditure on services and t is time and
measured in quarter.

 Interprtation: expenditures on services increased at

the (quarterly) rate of 0.705 percent.

12/28/2022 Prepared by: Hulunayen Y.

3.6 Relaxing the CLRM basic assumptions
 Multicollinearity

A. Multicollinearity problem
 The Assumption classical linear regression model
(CLRM) is that there is no high multicollinearity
among the regressors included in the regression model.

 Multicollinearity meant the existence of a “perfect” or

exact and inexact, linear relationship among some or
all explanatory variables of a regression model.

12/28/2022 Prepared by: Hulunayen Y.

B. Sources of multicollinearity
 The data collection method employed
 Model specification.
 Overdetermined model
C. Consequences of Multicollinearity
 The OLS estimators are BLUE
 OLS estimators have large variances and covariance's.
 Because of the large variance of the estimators, which
means large standard errors, the confidence interval
tend to be much wider, leading the acceptance of “zero
null hypothesis”
12/28/2022 Prepared by: Hulunayen Y.
 The computed t-ratio will be very small leading one or more of
the coefficients tend to be statistically insignificant when tested
 R-squared, the overall measure of goodness of fit, can be very
D. Remedial measures of multicollinearity
 Combining cross-sectional and time series data
 Dropping a variable(s) and specification bias.
 Transformation of variables
E. Tests to check the existence of multicollinearity
 Variance inflated factor (VIF)
 Correlation matrix
12/28/2022 Prepared by: Hulunayen Y.
 Heteroscedasticity
A. Source of Heteroscedasticity
 Model specification problem
 Data collection problem
 The presence of outliers

B. Consequences of Heteroscedasticity
 Variance of the error term under or over estimates
 The OLS estimators are not BLUE
 CI and t-ratio also affected
 Hypothesis testing is misleading

12/28/2022 Prepared by: Hulunayen Y.

C. Tests to check the existence of Heteroscedasticity

 Goldfeld-Quandt Test
 Breusch–Pagan–Godfrey Test
 White’s test

12/28/2022 Prepared by: Hulunayen Y.

3.6 Dummy variables
 Four types of variables that one generally encounters in
empirical analysis:
 These are: ratio scale, interval scale, ordinal scale,
and nominal scale.
 Regression models that may involve not only ratio
scale variables but also nominal scale variables.
 Such variables are also known as indicator variables,
categorical variables, qualitative variables, or
dummy variables.

12/28/2022 Prepared by: Hulunayen Y.

The Nature of Dummy Variable
 In regression analysis the dependent variable, is frequently
influenced not only by ratio scale variables (e.g., income,
output, prices, costs, height, temperature) but also by
variables that are essentially qualitative, or nominal scale, in
nature, such as sex, race, colour, religion, nationality,
geographical region and political party affiliation.

 One way we could “quantify” such attributes is by

constructing artificial variables that take on values of 1 or 0,
1 indicating the presence (or possession) of that attribute and
0 indicating the absence of that attribute.

12/28/2022 Prepared by: Hulunayen Y.

 Variables that assume such 0 and 1 values are called dummy
 Dummy variables can be used in regression models just as
easily as quantitative variables.

 Regression model may contain explanatory variables that are

exclusively dummy, or qualitative, in nature.

 Given : Yi = α + βDi + Ui
where Y= annual salary of a college professor
Di = 1 if male college professor = 0 otherwise (i.e., female

12/28/2022 Prepared by: Hulunayen Y.

 The above regression model may enable us to find out whether
sex makes any difference in a college professor’s salary,
assuming, of course, that all other variables such as age, degree
attained, and years of experience are held constant.

 Assuming that the disturbance satisfy the usually assumptions of

the classical linear regression model, we obtain from.
 Mean salary of female college professor: E(Y/Di =0)= α
 Mean salary of male college professor: E(Y/Di =0)= α+β

 The intercept term α gives the mean salary of female college

professors and the slope coefficient β tells by how much the
mean salary of a male college professor differs from the mean
salary of his female counterpart.
12/28/2022 Prepared by: Hulunayen Y.
 How to test whether there is sex discrimination or
not ?

 Example
𝑌𝑖 = 18,000 + 3,280Di
(0.32) (0.44)
t = (57.74) (7.439) R2 = 0.8737
 Based on the result the estimated mean salary of
female college professor is birr 18,000 and that of
male professor is birr 21,280.

12/28/2022 Prepared by: Hulunayen Y.

 Regression on one quantitative variable and one
qualitative variable with two classes

Yi = 𝛼1 + α2 Di + βXi + Ui
where Y= annual salary of a college professor
Di = 1 if male college professor = 0 otherwise (i.e.,
female professor)
 Mean salary of female college professor:
E(Y/Xi , Di =0) = 𝛼1 +βX i
 Mean salary of male college professor:
E(Y/Xi , Di =1) = (𝛼1 +α2 ) + βXi

12/28/2022 Prepared by: Hulunayen Y.

 The level of the male professor’s mean salary is different from
that of the female professor’s mean salary (by α2 ) but the rate
of change in the mean annual salary by years of experience is
the same for both sexes.

12/28/2022 Prepared by: Hulunayen Y.

 Regression on one quantitative variable and
two qualitative variables
 Let: Yi = 𝛼0 + α1 D1i + α2 D2i + βXi + Ui
where Y= annual salary of a college professor
D1i = 1 if male college professor = 0 otherwise (i.e., female
professor). D2i = 1 if white and 0 otherwise

 Exercises: From the above expression obtain the following

1. Mean salary for black female professor
2. Mean salary for black male professor
3. Mean salary for white female professor
4. Mean salary for white male professor
12/28/2022 Prepared by: Hulunayen Y.
 Regression on one quantitative variable and one qualitative
variable with more than two classes.
 Suppose we consider three mutually exclusive levels of
education: less than high school, high school, and college.
 If a qualitative variable has ‘m’ categories, introduce only ‘m-1’
dummy variables
 Following the rule that the number of dummies be one less than
the number of categories of the variable.
Yi = 𝛼0 + α1 D1i + α2 D2i + βXi + Ui
Yi = Where annual expenditure on health care
Xi = annual expenditure, D1 = 1 if high school education, D2 = 1 if
college education and = 0 otherwise
 Compute the mean health care expenditure functions for the
three levels of education
12/28/2022 Prepared by: Hulunayen Y.

12/28/2022 Prepared by: Hulunayen Y.

You might also like