CH-2 - PPT-Simple Linear Regression Analysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 105

EconomEtrics for managEmEnt

(MGMT3071)

Chapter two:

RegRession AnAlysis

Teklebirhan Alemnew (Assistant Professor)


tbalemnew@gmail.com
AAU, 2023
1
2.1. Introduction
 As you know, economic theories are mainly concerned with the
relationships between variables.

 These relationships can be stated in mathematical terms which


show the functional relationship of variables.

 The functional relationships of these variables define the


dependence of one variable upon the other variable(s) in the
specific functional form.

 The specific functional forms may be linear, quadratic,


logarithmic, exponential, or any other form.
By: Teklebirhan A. 2
Cont…
 The two types of regression analysis are,

 Simple Linear regression analysis known as two variables


regression in which the dependent variable is linearly related to a
single explanatory variable.

 Multiple Linear regression analysis in which the regressand is


related to two or more regressors.

By: Teklebirhan A. 3
2.2. The Concept of Regression Analysis
 The main goal of any econometric analysis is to establish an
acceptable empirical causal relationship between variables.

 Regression analysis is concerned with the study of the dependence


of one variable (the dependent variable) on one or more other
variables (the explanatory variable(s)).

 In other words, Regression analysis is concerned with describing


and evaluating the relationship between a given variable.

 The objective of regression analysis is to estimate and/or predict


the unknown (population) mean value of the dependent variable in
terms of the known values of the explanatory variables. 4
By: Teklebirhan A.
Cont…
 For instance, an economist may be interested in studying the
dependence of household monthly consumption expenditure on
household monthly disposable income.

 That is, our concern might be with predicting the average


consumption expenditure knowing household monthly disposable
income.

 Such an analysis is helpful in estimating the marginal propensity to


consume (MPC), that is, average change in consumption
expenditure for, say, a unit change in disposable income.

By: Teklebirhan A. 5
Cont…
.

Regression Line

 The line that passes through the average level of consumption


expenditure for each level of household income is known as the
regression line. It shows how the average consumption expenditure
increases with the household’s income.
6
By: Teklebirhan A.
Cont…
.

What is the difference between


Regression and Correlation
Analysis?

By: Teklebirhan A. 7
Cont…
 In addition, regression analysis is closely related to correlation
analysis but conceptually there is huge difference

 Statistical relationships (Regression analysis) by themselves


cannot logically imply causation.
 To ascribe causality, one must appeal to ‘a priori’ or theoretical
considerations.
 The primary objective of correlation analysis is to measure the
strength or degree of linear association between two variables.

 However, in regression analysis, we try to predict the average


value of the dependent variable on the basis of fixed values of the
explanatory variables. By: Teklebirhan A. 8
Cont…
 Terminologies in Regression Theory
 The variables in a regression relation consist of dependent and
explanatory variables.

 The dependent variable is the variable whose variation is being


explained by the other variable(s).

 The explanatory variable is the variable whose variation is used to


explain the variation in the dependent variable.

 The following is a representative list of the various terminologies


used in regression analysis:

By: Teklebirhan A. 9
Cont…
Dependent Variable Explanatory Variable
Explained variable Independent variable
Predictand Predictor
Regressand Regressor
Response Stimulus
Endogenous Exogenous
Outcome Covariate
Controlled variable Control variable

By: Teklebirhan A. 10
Cont…
 Note that Regression analysis can be simple or multiple depending
on the number of variables included in the analysis.

 That is, if we are studying the dependence of one variable on only a


single explanatory variable, such as the dependence of
consumption expenditure on the level of real income, such a study
is known as simple, or two-variable, regression analysis.

 However, if we are studying the dependence of one variable on


more than one explanatory variable, such as the dependence of
crop-yield on rainfall, labor spent, farm size, fertilizer, and etc, it is
known as Multiple Regression Analysis. By: Teklebirhan A. 11
Types of Regression Models

By: Teklebirhan A. 12
2.3. Correlation Analysis
 Correlation is a bivariate analysis that measures the strength of
association between two variables and the direction of the
relationship - without being able to infer causal relationships.

 In terms of the strength of relationship, the value of the


correlation coefficient varies between +1 and -1.

 A value of ± 1 indicates a perfect degree of association


between the two variables.

 As the correlation coefficient value goes towards 0, the


relationship between the two variables will be weaker.
By: Teklebirhan A. 13
Cont…
 The direction of the relationship is indicated by the sign of the
coefficient;

 a + sign indicates a positive relationship and

 a – sign indicates a negative relationship.

 Generally, Correlation measures the direction and strength of the


linear relationship between two quantitative variables

 Represented by ‘r’.

 There is no assumption of causality

 Assumes a linear association between two variables.


By: Teklebirhan A. 14
Scatter plot
 Linear relationships implying straight line association are
visualized with scatter plots.

 Consider the following Cigarette data set (n = 11)


per capita cigarette lung cancer mortality per
consumption (X) 100,000 in 1950 (Y)

By: Teklebirhan A. 15
Cont…
scatter LUNGCA CIG
Assess:
 Functional Form
 Direction of
association
 Outliers
 Strength of
relation

By: Teklebirhan A. 16
Cont…
.  Form: linear
 Direction: positive
association
 Outlier: no clear
outliers
 Strength: difficult to
determine by eye

By: Teklebirhan A. 17
Cont…
 The eye is not a good judge of strength

 Identical data sets on differently scaled axes

This relation appears to be weak This relation appears strong

 The different appearances in strength is an artifact of the


axis scaling (shows eye is not a good judge of strength)
By: Teklebirhan A. 18
Cont…
 The pattern of data is indicative of the type of relationship
between your two variables:

 Positive relationship

 Negative relationship

 No relationship

By: Teklebirhan A. 19
Cont…
Positive Relationship

By: Teklebirhan A. 20
Cont…
Negative Relationship

Income & illitracy rates (%)

100
Rate of illiteracry (%)

80
60
40
20
0
0 200 400 600 800 1000 1200
Income

By: Teklebirhan A. 21
Cont…
No Relation

By: Teklebirhan A. 22
Cont…
 Usually, in statistics, we use four types of correlation measures:

a) Pearson correlation (Simple Correlation coefficient (r),

b) Spearman rank correlation,

c) Kendall rank correlation, and

d) the Point-Biserial correlation.

By: Teklebirhan A. 23
a) Pearson Correlation
 It is also called Simple Correlation coefficient (r) or product
moment correlation coefficient.

 Pearson r correlation is the most widely used correlation statistic


to measure the degree of the relationship between linearly related
variables.

 For example, in the fertilizer market, if we want to measure how


two fertilizer are related to each other, Pearson r correlation is
used to measure the degree of relationship between the two.

 It measures the nature and strength of association between two


variables of quantitative type. By: Teklebirhan A. 24
Cont…
The value of r ranges between ( -1) and ( +1)
.
The sign of r denotes the nature of
association

While the value of r denotes the strength of


association.
 If the sign is +ve this means the relation is Karl Pearson
direct (an increase in one variable is associated 1857 - 1936
with an increase in the other variable and a
decrease in one variable is associated with a
decrease in the other variable).
 While if the sign is -ve this means an inverse or indirect relationship
(which means an increase in one variable is associated with a decrease
in the other).
By: Teklebirhan A. 25
Cont…

By: Teklebirhan A. 26
Correlational Direction and Strength

By: Teklebirhan A. 27
Cont…
 The following formula is used to calculate the Pearson ‘r’
correlation:

 z quantify distance above or below mean in standard deviations


units.

 When z scores track in same directions ⟹products are positive

 When z scores track in opposite directions ⟹ products are


negative By: Teklebirhan A. 28
Cont…
 Types of research questions a Pearson correlation can examine:

 Is there a relationship between job satisfaction, as measured by


the JSS, and income, measured in dollars?

 Is there a statistically significant relationship between age, as


measured in years, and height, measured in inches?

 Is there a relationship between temperature, measured in


degrees Fahrenheit, and ice cream sales, measured by income?

 Is there a statistically significant relationship between


fertilizer, as measured in Kg, and crop productivity, measured
in Quintal?
By: Teklebirhan A. 29
Cont…
 Assumptions

 For the Pearson r correlation, both variables should be normally


distributed (normally distributed variables have a bell-shaped
curve).

 Other assumptions include linearity and homoscedasticity.


Linearity assumes a straight line relationship between each of the
two variables and homoscedasticity assumes that data is equally
distributed about the regression line.

By: Teklebirhan A. 30
Cont…
 Example

By: Teklebirhan A. 31
Cont…
 STATA Output – Correlation coefficient (Pearson)

. pwcorr LUNGCA CIG, obs sig star(1)


NB: Non-
LUNGCA CIG
significant
LUNGCA 1.0000 correlation does
not imply no
11 association

CIG 0.7373* 1.0000


0.0096
11 11

 r = 0.74 indicates a strong, positive association at 1% level of


significance.
By: Teklebirhan A. 32
b) Spearman Rank Correlation
 Spearman rank correlation is a non-parametric test that is used to
measure the degree of association between two variables.

 The Spearman rank correlation test does not carry any


assumptions about the distribution of the data and is the
appropriate correlation analysis when the variables are measured
on a scale that is at least ordinal.

 The following formula is used to calculate the Spearman rank


ρ= Spearman rank correlation
correlation:
di= the difference between the ranks of
corresponding variables
n= number of observations
By: Teklebirhan A. 33
Cont…

By: Teklebirhan A. 34
Cont…
 Types of research questions a Spearman Correlation can examine:

 Is there a statistically significant relationship between


participants’ level of education (high school, bachelor’s, or
graduate degree) and their starting salary?

 Is there a statistically significant relationship between worker’s


productivity and worker’s age?

By: Teklebirhan A. 35
Cont…

. spearman LUNGCA CIG, stats(rho obs p) star(0.01)

Number of obs = 11
Spearman's rho = 0.8428

Test of Ho: LUNGCA and CIG are independent


Prob > |t| = 0.0011

at 1% level of significance

By: Teklebirhan A. 36
2.4. Population Regression Function Versus
Sample Regression Function
 Population Regression Function (PRF)
 The economic theory of consumption (in its simplest form) can be
modeled as stochastic of the following form:

 The econometrics model given in the above is called population


regression model or, simply, the population model.

 This population regression model is called the true relationship


because Y, X and U represent their respective population values,
and α and β are called the true parameters.
By: Teklebirhan A. 37
Cont…

By: Teklebirhan A. 38
Cont…

By: Teklebirhan A. 39
Cont…

By: Teklebirhan A. 40
Cont…

By: Teklebirhan A. 41
Cont…
 Therefore, right now, our major task is to estimate the population
regression function (PRF) on the basis of the sample regression
function (SRF).

By: Teklebirhan A. 42
2.5. Methods of Estimation: The Classical Simple Linear
Regression Analysis

By: Teklebirhan A. 43
Cont…
 Specifying the model is the first stage of any econometric
application. The next step is the estimation of the numerical values
of the parameters of economic relationships.

 The parameters of the simple linear regression model can be


estimated by the three most commonly used estimation methods:
1. Ordinary Least Square Method (OLS)
2. Method of Moments (MM)
3. Maximum Likelihood Method (MLM)
 But, having some desirable properties (property of linearity,
unbiasedness, and minimum variance), OLS method is the most
popular method to estimate regression parameters.
By: Teklebirhan A. 44
Cont…

By: Teklebirhan A. 45
2.5.1. The Basic Assumptions of the Classical Linear
Regression Analysis (OLS) to estimate SLRM & MLRM
 The method of OLS is attributed to Carl Friedrich Gauss, a
German Mathematician.

 OLS is an econometric method used to derive estimates of the


parameters of economic relationships from statistical observations.

 However, it works under some restrictive assumptions.

 The most important of these assumptions are discussed below.

By: Teklebirhan A. 46
Cont…

A model is termed as linear if it is linear in parameters

By: Teklebirhan A. 47
Cont…
 This assumption implies that the values of Y corresponding to
various values of X have constant variance.

By: Teklebirhan A. 48
Cont…

 This assumption is required mainly for hypothesis testing (inference).

By: Teklebirhan A. 49
Cont…

By: Teklebirhan A. 50
u

Cont…
i

By: Teklebirhan A. 51
Cont…

By: Teklebirhan A. 52
Cont…
9) No model specification error :The econometric model is correctly
specified
 No omission of relevant variable(s),
 No inclusion of unnecessary variable(s),
 Absence of adoption of wrong functional form.
 If not, OLS estimators will be biased & inconsistent
10) Variability in the values of X
 The ‘X’ values in a given sample must not all be the same.
11) Absence of high multi-collinearity among explanatory variables
(specific to Multiple regression models – Chapter 3)
 There is no perfect linear relationship among the explanatory
variables - not perfectly correlated with each other
By: Teklebirhan A. 53
Cont…

 NB:

 Without the realization of these assumptions, the


application of OLS results would be misleading.

By: Teklebirhan A. 54
2.5.2. Estimation of SLRM by Ordinary Least
Square (OLS) Method

By: Teklebirhan A. 55
Cont…

By: Teklebirhan A. 56
Cont…

Obsns

1. 4 5 20 25 -3 -4 12 16
2. 4 4 16 16 -3 -5 15 25
3. 7 8 56 64 0 -1 0 1
4. 8 10 80 100 1 1 1 1
5. 9 13 117 169 2 4 8 16
6. 10 14 140 196 3 5 15 25
Sums 42 54 429 570 0 0 51 84

By: Teklebirhan A. 57
Cont…

By: Teklebirhan A. 58
2.6. Alternative Functional Forms and
Interpretation of OLS Estimates for SLRM

By: Teklebirhan A. 59
Cont…

By: Teklebirhan A. 60
Cont…

By: Teklebirhan A. 61
Cont…

By: Teklebirhan A. 62
Cont…

By: Teklebirhan A. 63
Cont…

By: Teklebirhan A. 64
Cont…

By: Teklebirhan A. 65
Cont…

By: Teklebirhan A. 66
Cont…

By: Teklebirhan A. 67
Cont…

By: Teklebirhan A. 68
Cont…

By: Teklebirhan A. 69
Cont…

By: Teklebirhan A. 70
Cont…

Model If X increases by Then Y will change by

Linear 1 unit
Linear-Log 1%
Log-Linear 1 unit
Log-Log 1%

By: Teklebirhan A. 71
Cont...

By: Teklebirhan A. 72
2.7. Decomposition of the Variation of Y and
“Goodness of Fit” of an Estimated Model

By: Teklebirhan A. 73
Cont…

By: Teklebirhan A. 74
Cont…

By: Teklebirhan A. 75
Cont…

By: Teklebirhan A. 76
Cont…

By: Teklebirhan A. 77
Cont…

By: Teklebirhan A. 78
2.8. Evaluation of an Estimated Model for SLRM
& MLRM
 After estimation of a model, the next stage is to evaluate the
estimated model.

 By evaluation of the model means examining the ‘goodness’ of an


estimated model.

 To judge on the ‘goodness’ of an estimated econometrics model,


there are three criteria. These are

 Economic criterion,

 Statistical criterion (First order test) and

 Econometric criterion (Second Order Tests).


By: Teklebirhan A. 79
2.8.1.Econometric Criterion: Statistical Desirable
Properties of OLS Estimators and the Gauss-Markov
Theorem
 There are traditional criteria based on which the closeness of an
estimate to the true population parameter can be determined.

 These are called desirable properties of Estimators (or estimates).

 Desirable properties of estimators are two categories:

1) Finite (small sample) desirable properties of estimators and

2) Infinite (large sample) or asymptotic properties of estimators.

By: Teklebirhan A. 80
Cont…
1. Finite (Small Sample) Properties of Estimators.

The desirable attributes of estimators under smaller sample sizes


are: = a) + b)
a)Unbiasedness
b)Minimum variance
c)Efficiency Estimator
d)Minimum mean square error (MMSE)
e)Linearity Estimator
f)Best, linear, unbiased Estimator (BLUE) - Gauss-Markov Theorem
 An estimator is called BLUE if: linear, unbiased & Minimum
variance
By: Teklebirhan A. 81
Cont…

By: Teklebirhan A. 82
Cont…
2) Large-Sample (Asymptotic) Properties of Estimators

 It often happens that an estimator does not satisfy one or more of


the desirable statistical properties in small samples.

 But as the sample size increases indefinitely, the estimator


possesses several desirable statistical properties.

 These properties are known as the large-sample, or asymptotic,


properties.

By: Teklebirhan A. 83
Cont…
 Asymptotic (large sample) desirable properties of estimators are:

 Asymptotic unbiasedness

 Consistency (biased + Variance tends to zero as ‘n’ increase)

 Asymptotic efficiency (consistent + min variance)

By: Teklebirhan A. 84
2.8.2. Statistical Inference: Statistical Test of
Significance of OLS Estimators (First Order Tests)
 In this section, we shall develop statistical criteria for the
evaluation of an estimated model.

 Statistical criteria are developed based on statistical and


probability theories.

 The application of statistical criteria to judge on the goodness of a


model is known as tests of the statistical significance (TSS) or first
order tests of a model.

By: Teklebirhan A. 85
Cont…

By: Teklebirhan A. 86
Cont…

By: Teklebirhan A. 87
Cont…

By: Teklebirhan A. 88
Cont…

By: Teklebirhan A. 89
Cont…

By: Teklebirhan A. 90
Cont…

By: Teklebirhan A. 91
Cont…
 Thus, with these critical values the rejection and acceptance
regions for the null-hypothesis will be:

By: Teklebirhan A. 92
Cont…

By: Teklebirhan A. 93
Cont…

By: Teklebirhan A. 94
Cont…

By: Teklebirhan A. 95
Cont…

By: Teklebirhan A. 96
Cont…

By: Teklebirhan A. 97
Cont…
 In statistics, the process of estimating an interval of values
between which the true values of the population parameters are
expected to lie based on the sampling distribution of the sample
estimates is called interval estimation.

 It can be done depending on the sample size;

1) Confidence interval from the Standard Normal Distribution (Z-


Distribution)

2) Confidence interval from the Student’s t-distribution.

By: Teklebirhan A. 98
Cont…
 Confidence interval from the Standard Normal Distribution (Z-
Distribution)

 The meaning of this confidence interval is that there is 95%


chance for this interval to contain the true value of the unknown
parameter β within its range. 99
By: Teklebirhan A.
2.9. Prediction using Simple Linear
Regression Model

By: Teklebirhan A. 100


Cont…

By: Teklebirhan A.
101
Cont…
 Reporting the Results of Regression Analysis

 The results of the regression analysis derived are reported in


conventional formats.

 It is not sufficient merely to report the estimates of β’s.

 There are two conventional ways to report a regression result:

a) Equation form, i.e., by fitting the estimated coefficients in to the


regression model and

b) Table form

By: Teklebirhan A. 102


Cont…

By: Teklebirhan A. 103


Cont…
b) Table Form
 In this case, the estimated coefficients, the corresponding t-
statistics, and some other indicators are presented in tabular form.
 Example: The estimated regression result of our consumption
function can be presented using table as follows:
(1)
Consumption
Expenditure (in
ETB)
Monthly Income (in 0.607***
ETB)
(10.94)

Constant 1.536*
(2.84)
Observations 6
R2 0.968
t statistics in parentheses
*
p < 0.05, ** p < 0.01, *** p < 0.001

By: Teklebirhan A. 104


End of Chapter Two

By: Teklebirhan A. 105

You might also like