CH-2 - PPT-Simple Linear Regression Analysis

EconomEtrics for managEmEnt
(MGMT3071)
Chapter two:
RegRession AnAlysis
Teklebirhan Alemnew (Assistant Professor)

tbalemnew@gmail.com
AAU, 2023
1
2.1. Introduction
 As you know, economic theories are mainly concerned with the
relationships between variables.
 These relationships can be stated in mathematical terms which

show the functional relationship of variables.
 The functional relationships of these variables define the

dependence of one variable upon the other variable(s) in the
specific functional form.
 The specific functional forms may be linear, quadratic,

logarithmic, exponential, or any other form.
By: Teklebirhan A. 2
Cont…
 The two types of regression analysis are,
 Simple Linear regression analysis known as two variables

regression in which the dependent variable is linearly related to a
single explanatory variable.
 Multiple Linear regression analysis in which the regressand is

related to two or more regressors.
2.2. The Concept of Regression Analysis
 The main goal of any econometric analysis is to establish an
acceptable empirical causal relationship between variables.
 Regression analysis is concerned with the study of the dependence

of one variable (the dependent variable) on one or more other
variables (the explanatory variable(s)).
 In other words, Regression analysis is concerned with describing

and evaluating the relationship between a given variable.
 The objective of regression analysis is to estimate and/or predict

the unknown (population) mean value of the dependent variable in
terms of the known values of the explanatory variables. 4
By: Teklebirhan A.
Cont…
 For instance, an economist may be interested in studying the
dependence of household monthly consumption expenditure on
household monthly disposable income.
 That is, our concern might be with predicting the average

consumption expenditure knowing household monthly disposable
income.
 Such an analysis is helpful in estimating the marginal propensity to

consume (MPC), that is, average change in consumption
expenditure for, say, a unit change in disposable income.
Cont…
.
Regression Line
 The line that passes through the average level of consumption

expenditure for each level of household income is known as the
regression line. It shows how the average consumption expenditure
increases with the household’s income.
6
By: Teklebirhan A.
Cont…
.
What is the difference between

Regression and Correlation
Analysis?
Cont…
 In addition, regression analysis is closely related to correlation
analysis but conceptually there is huge difference
 Statistical relationships (Regression analysis) by themselves

cannot logically imply causation.
 To ascribe causality, one must appeal to ‘a priori’ or theoretical
considerations.
 The primary objective of correlation analysis is to measure the
strength or degree of linear association between two variables.
 However, in regression analysis, we try to predict the average

value of the dependent variable on the basis of fixed values of the
explanatory variables. By: Teklebirhan A. 8
Cont…
 Terminologies in Regression Theory
 The variables in a regression relation consist of dependent and
explanatory variables.
 The dependent variable is the variable whose variation is being

explained by the other variable(s).
 The explanatory variable is the variable whose variation is used to

explain the variation in the dependent variable.
 The following is a representative list of the various terminologies

used in regression analysis:
Cont…
Dependent Variable Explanatory Variable
Explained variable Independent variable
Predictand Predictor
Regressand Regressor
Response Stimulus
Endogenous Exogenous
Outcome Covariate
Controlled variable Control variable
Cont…
 Note that Regression analysis can be simple or multiple depending
on the number of variables included in the analysis.
 That is, if we are studying the dependence of one variable on only a

single explanatory variable, such as the dependence of
consumption expenditure on the level of real income, such a study
is known as simple, or two-variable, regression analysis.
 However, if we are studying the dependence of one variable on

more than one explanatory variable, such as the dependence of
crop-yield on rainfall, labor spent, farm size, fertilizer, and etc, it is
known as Multiple Regression Analysis. By: Teklebirhan A. 11
Types of Regression Models
2.3. Correlation Analysis
 Correlation is a bivariate analysis that measures the strength of
association between two variables and the direction of the
relationship - without being able to infer causal relationships.
 In terms of the strength of relationship, the value of the

correlation coefficient varies between +1 and -1.
 A value of ± 1 indicates a perfect degree of association

between the two variables.
 As the correlation coefficient value goes towards 0, the

relationship between the two variables will be weaker.
Cont…
 The direction of the relationship is indicated by the sign of the
coefficient;
 a + sign indicates a positive relationship and
 a – sign indicates a negative relationship.
 Generally, Correlation measures the direction and strength of the

linear relationship between two quantitative variables
 Represented by ‘r’.
 There is no assumption of causality
 Assumes a linear association between two variables.

Scatter plot
 Linear relationships implying straight line association are
visualized with scatter plots.
 Consider the following Cigarette data set (n = 11)

per capita cigarette lung cancer mortality per
consumption (X) 100,000 in 1950 (Y)
Cont…
scatter LUNGCA CIG
Assess:
 Functional Form
 Direction of
association
 Outliers
 Strength of
relation
Cont…
.  Form: linear
 Direction: positive
association
 Outlier: no clear
outliers
 Strength: difficult to
determine by eye
Cont…
 The eye is not a good judge of strength
 Identical data sets on differently scaled axes
This relation appears to be weak This relation appears strong
 The different appearances in strength is an artifact of the

axis scaling (shows eye is not a good judge of strength)
Cont…
 The pattern of data is indicative of the type of relationship
between your two variables:
 Positive relationship
 Negative relationship
 No relationship
Cont…
Positive Relationship
Cont…
Negative Relationship
Income & illitracy rates (%)
100
Rate of illiteracry (%)
80
60
40
20
0
0 200 400 600 800 1000 1200
Income
Cont…
No Relation
Cont…
 Usually, in statistics, we use four types of correlation measures:
a) Pearson correlation (Simple Correlation coefficient (r),
b) Spearman rank correlation,
c) Kendall rank correlation, and
d) the Point-Biserial correlation.
a) Pearson Correlation
 It is also called Simple Correlation coefficient (r) or product
moment correlation coefficient.
 Pearson r correlation is the most widely used correlation statistic

to measure the degree of the relationship between linearly related
variables.
 For example, in the fertilizer market, if we want to measure how

two fertilizer are related to each other, Pearson r correlation is
used to measure the degree of relationship between the two.
 It measures the nature and strength of association between two

variables of quantitative type. By: Teklebirhan A. 24
Cont…
The value of r ranges between ( -1) and ( +1)
.
The sign of r denotes the nature of
association
While the value of r denotes the strength of

association.
 If the sign is +ve this means the relation is Karl Pearson
direct (an increase in one variable is associated 1857 - 1936
with an increase in the other variable and a
decrease in one variable is associated with a
decrease in the other variable).
 While if the sign is -ve this means an inverse or indirect relationship
(which means an increase in one variable is associated with a decrease
in the other).
Cont…
Correlational Direction and Strength
Cont…
 The following formula is used to calculate the Pearson ‘r’
correlation:
 z quantify distance above or below mean in standard deviations

units.
 When z scores track in same directions ⟹products are positive
 When z scores track in opposite directions ⟹ products are

negative By: Teklebirhan A. 28
Cont…
 Types of research questions a Pearson correlation can examine:
 Is there a relationship between job satisfaction, as measured by

the JSS, and income, measured in dollars?
 Is there a statistically significant relationship between age, as

measured in years, and height, measured in inches?
 Is there a relationship between temperature, measured in

degrees Fahrenheit, and ice cream sales, measured by income?
 Is there a statistically significant relationship between

fertilizer, as measured in Kg, and crop productivity, measured
in Quintal?
Cont…
 Assumptions
 For the Pearson r correlation, both variables should be normally

distributed (normally distributed variables have a bell-shaped
curve).
 Other assumptions include linearity and homoscedasticity.

Linearity assumes a straight line relationship between each of the
two variables and homoscedasticity assumes that data is equally
distributed about the regression line.
Cont…
 Example
Cont…
 STATA Output – Correlation coefficient (Pearson)
. pwcorr LUNGCA CIG, obs sig star(1)

NB: Non-
LUNGCA CIG
significant
LUNGCA 1.0000 correlation does
not imply no
11 association
CIG 0.7373* 1.0000

0.0096
11 11
 r = 0.74 indicates a strong, positive association at 1% level of

significance.
b) Spearman Rank Correlation
 Spearman rank correlation is a non-parametric test that is used to
measure the degree of association between two variables.
 The Spearman rank correlation test does not carry any

assumptions about the distribution of the data and is the
appropriate correlation analysis when the variables are measured
on a scale that is at least ordinal.
 The following formula is used to calculate the Spearman rank

ρ= Spearman rank correlation
correlation:
di= the difference between the ranks of
corresponding variables
n= number of observations
Cont…
Cont…
 Types of research questions a Spearman Correlation can examine:
 Is there a statistically significant relationship between

participants’ level of education (high school, bachelor’s, or
graduate degree) and their starting salary?
 Is there a statistically significant relationship between worker’s

productivity and worker’s age?
Cont…
. spearman LUNGCA CIG, stats(rho obs p) star(0.01)
Number of obs = 11
Spearman's rho = 0.8428
Test of Ho: LUNGCA and CIG are independent

Prob > |t| = 0.0011
at 1% level of significance
2.4. Population Regression Function Versus
Sample Regression Function
 Population Regression Function (PRF)
 The economic theory of consumption (in its simplest form) can be
modeled as stochastic of the following form:
 The econometrics model given in the above is called population

regression model or, simply, the population model.
 This population regression model is called the true relationship

because Y, X and U represent their respective population values,
and α and β are called the true parameters.
Cont…
Cont…
Cont…
Cont…
Cont…
 Therefore, right now, our major task is to estimate the population
regression function (PRF) on the basis of the sample regression
function (SRF).
2.5. Methods of Estimation: The Classical Simple Linear
Regression Analysis
Cont…
 Specifying the model is the first stage of any econometric
application. The next step is the estimation of the numerical values
of the parameters of economic relationships.
 The parameters of the simple linear regression model can be

estimated by the three most commonly used estimation methods:
1. Ordinary Least Square Method (OLS)
2. Method of Moments (MM)
3. Maximum Likelihood Method (MLM)
 But, having some desirable properties (property of linearity,
unbiasedness, and minimum variance), OLS method is the most
popular method to estimate regression parameters.
Cont…
2.5.1. The Basic Assumptions of the Classical Linear
Regression Analysis (OLS) to estimate SLRM & MLRM
 The method of OLS is attributed to Carl Friedrich Gauss, a
German Mathematician.
 OLS is an econometric method used to derive estimates of the

parameters of economic relationships from statistical observations.
 However, it works under some restrictive assumptions.
 The most important of these assumptions are discussed below.
Cont…
A model is termed as linear if it is linear in parameters
Cont…
 This assumption implies that the values of Y corresponding to
various values of X have constant variance.
Cont…
 This assumption is required mainly for hypothesis testing (inference).
Cont…
u
Cont…
i
Cont…
Cont…
9) No model specification error :The econometric model is correctly
specified
 No omission of relevant variable(s),
 No inclusion of unnecessary variable(s),
 Absence of adoption of wrong functional form.
 If not, OLS estimators will be biased & inconsistent
10) Variability in the values of X
 The ‘X’ values in a given sample must not all be the same.
11) Absence of high multi-collinearity among explanatory variables
(specific to Multiple regression models – Chapter 3)
 There is no perfect linear relationship among the explanatory
variables - not perfectly correlated with each other
Cont…
 NB:
 Without the realization of these assumptions, the

application of OLS results would be misleading.
2.5.2. Estimation of SLRM by Ordinary Least
Square (OLS) Method
Cont…
Cont…
Obsns
1. 4 5 20 25 -3 -4 12 16
2. 4 4 16 16 -3 -5 15 25
3. 7 8 56 64 0 -1 0 1
4. 8 10 80 100 1 1 1 1
5. 9 13 117 169 2 4 8 16
6. 10 14 140 196 3 5 15 25
Sums 42 54 429 570 0 0 51 84
Cont…
2.6. Alternative Functional Forms and
Interpretation of OLS Estimates for SLRM
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Model If X increases by Then Y will change by
Linear 1 unit
Linear-Log 1%
Log-Linear 1 unit
Log-Log 1%
Cont...
2.7. Decomposition of the Variation of Y and
“Goodness of Fit” of an Estimated Model
Cont…
Cont…
Cont…
Cont…
Cont…
2.8. Evaluation of an Estimated Model for SLRM
& MLRM
 After estimation of a model, the next stage is to evaluate the
estimated model.
 By evaluation of the model means examining the ‘goodness’ of an

estimated model.
 To judge on the ‘goodness’ of an estimated econometrics model,

there are three criteria. These are
 Economic criterion,
 Statistical criterion (First order test) and
 Econometric criterion (Second Order Tests).

2.8.1.Econometric Criterion: Statistical Desirable
Properties of OLS Estimators and the Gauss-Markov
Theorem
 There are traditional criteria based on which the closeness of an
estimate to the true population parameter can be determined.
 These are called desirable properties of Estimators (or estimates).
 Desirable properties of estimators are two categories:
1) Finite (small sample) desirable properties of estimators and
2) Infinite (large sample) or asymptotic properties of estimators.
Cont…
1. Finite (Small Sample) Properties of Estimators.
The desirable attributes of estimators under smaller sample sizes

are: = a) + b)
a)Unbiasedness
b)Minimum variance
c)Efficiency Estimator
d)Minimum mean square error (MMSE)
e)Linearity Estimator
f)Best, linear, unbiased Estimator (BLUE) - Gauss-Markov Theorem
 An estimator is called BLUE if: linear, unbiased & Minimum
variance
Cont…
Cont…
2) Large-Sample (Asymptotic) Properties of Estimators
 It often happens that an estimator does not satisfy one or more of

the desirable statistical properties in small samples.
 But as the sample size increases indefinitely, the estimator

possesses several desirable statistical properties.
 These properties are known as the large-sample, or asymptotic,

properties.
Cont…
 Asymptotic (large sample) desirable properties of estimators are:
 Asymptotic unbiasedness
 Consistency (biased + Variance tends to zero as ‘n’ increase)
 Asymptotic efficiency (consistent + min variance)
2.8.2. Statistical Inference: Statistical Test of
Significance of OLS Estimators (First Order Tests)
 In this section, we shall develop statistical criteria for the
evaluation of an estimated model.
 Statistical criteria are developed based on statistical and

probability theories.
 The application of statistical criteria to judge on the goodness of a

model is known as tests of the statistical significance (TSS) or first
order tests of a model.
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
 Thus, with these critical values the rejection and acceptance
regions for the null-hypothesis will be:
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
 In statistics, the process of estimating an interval of values
between which the true values of the population parameters are
expected to lie based on the sampling distribution of the sample
estimates is called interval estimation.
 It can be done depending on the sample size;
1) Confidence interval from the Standard Normal Distribution (Z-

Distribution)
2) Confidence interval from the Student’s t-distribution.
Cont…
 Confidence interval from the Standard Normal Distribution (Z-
Distribution)
 The meaning of this confidence interval is that there is 95%

chance for this interval to contain the true value of the unknown
parameter β within its range. 99
By: Teklebirhan A.
2.9. Prediction using Simple Linear
Regression Model

Cont…
By: Teklebirhan A.
101
Cont…
 Reporting the Results of Regression Analysis
 The results of the regression analysis derived are reported in

conventional formats.
 It is not sufficient merely to report the estimates of β’s.
 There are two conventional ways to report a regression result:
a) Equation form, i.e., by fitting the estimated coefficients in to the

regression model and
b) Table form

Cont…

Cont…
b) Table Form
 In this case, the estimated coefficients, the corresponding t-
statistics, and some other indicators are presented in tabular form.
 Example: The estimated regression result of our consumption
function can be presented using table as follows:
(1)
Consumption
Expenditure (in
ETB)
Monthly Income (in 0.607***
ETB)
(10.94)
Constant 1.536*
(2.84)
Observations 6
R2 0.968
t statistics in parentheses
*
p < 0.05, ** p < 0.01, *** p < 0.001

End of Chapter Two

CH-2 - PPT-Simple Linear Regression Analysis

Uploaded by

Copyright:

Available Formats

You might also like

CH-2 - PPT-Simple Linear Regression Analysis

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH-2 - PPT-Simple Linear Regression Analysis

Uploaded by

Copyright:

Available Formats

EconomEtrics for managEmEnt

Teklebirhan Alemnew (Assistant Professor)

 These relationships can be stated in mathematical terms which

 The functional relationships of these variables define the

 The specific functional forms may be linear, quadratic,

 Simple Linear regression analysis known as two variables

 Multiple Linear regression analysis in which the regressand is

 Regression analysis is concerned with the study of the dependence

 In other words, Regression analysis is concerned with describing

 The objective of regression analysis is to estimate and/or predict

 That is, our concern might be with predicting the average

 Such an analysis is helpful in estimating the marginal propensity to

 The line that passes through the average level of consumption

What is the difference between

 Statistical relationships (Regression analysis) by themselves

 However, in regression analysis, we try to predict the average

 The dependent variable is the variable whose variation is being

 The explanatory variable is the variable whose variation is used to

 The following is a representative list of the various terminologies

 That is, if we are studying the dependence of one variable on only a

 However, if we are studying the dependence of one variable on

 In terms of the strength of relationship, the value of the

 A value of ± 1 indicates a perfect degree of association

 As the correlation coefficient value goes towards 0, the

 a + sign indicates a positive relationship and

 a – sign indicates a negative relationship.

 Generally, Correlation measures the direction and strength of the

 There is no assumption of causality

 Assumes a linear association between two variables.

 Consider the following Cigarette data set (n = 11)

 Identical data sets on differently scaled axes

This relation appears to be weak This relation appears strong

 The different appearances in strength is an artifact of the

Income & illitracy rates (%)

a) Pearson correlation (Simple Correlation coefficient (r),

b) Spearman rank correlation,

c) Kendall rank correlation, and

d) the Point-Biserial correlation.

 Pearson r correlation is the most widely used correlation statistic

 For example, in the fertilizer market, if we want to measure how

 It measures the nature and strength of association between two

While the value of r denotes the strength of

 z quantify distance above or below mean in standard deviations

 When z scores track in same directions ⟹products are positive

 When z scores track in opposite directions ⟹ products are

 Is there a relationship between job satisfaction, as measured by

 Is there a statistically significant relationship between age, as

 Is there a relationship between temperature, measured in

 Is there a statistically significant relationship between

 For the Pearson r correlation, both variables should be normally

 Other assumptions include linearity and homoscedasticity.

. pwcorr LUNGCA CIG, obs sig star(1)

CIG 0.7373* 1.0000

 r = 0.74 indicates a strong, positive association at 1% level of

 The Spearman rank correlation test does not carry any

 The following formula is used to calculate the Spearman rank

 Is there a statistically significant relationship between

 Is there a statistically significant relationship between worker’s