Lecture 3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

University of Agricultural Sciences and Veterinary Medicine Cluj-Napoca

- Agribusiness MSc -

Lecture 3. Two-variable regression


analysis: Basic Ideas

Course: Econometrics
Instructor: Diana Dumitras

Fall 2013
Types of regressions

 Simple (= Two-variable) regression analysis


- dependent variable is related to a single explanatory variable

 Multiple regression analysis


- dependent variable is related to more explanatory variables
A Hypothetical Example: The data
Income, $

X 80 100 120 140 160 180 200 220 240 260

55 65 79 80 102 110 120 135 137 150


Consumption
expenditure, $ 60 70 84 93 107 115 136 137 145 152

Y 65 74 90 95 110 120 140 140 155 175

70 80 94 103 116 130 144 152 165 178

75 85 98 108 118 135 145 157 175 180

88 113 125 140 160 189 185

115 162 191

Total 325 462 445 707 678 750 685 1043 966 1211

 Population: 60 families
 The families are divided in 10 income groups
Steps in analyzing the relationship

 Plot the data


 Calculate the mean values:
 Conditional expected values
- For level of income $80: E(Y|X) = 325 / 5 = 65 $

 Unconditional expected values


- For a family: E(Y) = 7272 / 60 = 121.2 $

 Plot the mean values (the conditional expected values)

 Draw the population regression line


Steps in analyzing the relationship

 Plot the data


 Plot the consumption expenditure against income

200
Weekly consumption expenditure, $

150

100

50
60 80 100 120 140 160 180 200 220 240 260
Weekly income, $
Steps in analyzing the relationship

 Calculate the mean values


Income, $ X 80 100 120 140 160 180 200 220 240 260

55 65 79 80 102 110 120 135 137 150


Consumption
60 70 84 93 107 115 136 137 145 152
expenditure, $
Y 65 74 90 95 110 120 140 140 155 175

70 80 94 103 116 130 144 152 165 178

75 85 98 108 118 135 145 157 175 180

88 113 125 140 160 189 185


Conditional
expected 115 162 191
value
Total 325 462 445 707 678 750 685 1043 966 1211
Unconditional E(Y| X) 65 77 89 101 113 125 137 149 161 173
expected
value E(Y) 121,2
Steps in analyzing the relationship

 Plot the mean values


 Plot the conditional mean values of Y against X values

200
Weekly consumption expenditure, $

E(Y|X)
150

100

50
60 80 100 120 140 160 180 200 220 240 260
Weekly income, $
Steps in analyzing the relationship

 Draw the population regression line (PRL)


 Connect the conditional mean values
 On average, weekly consumption expenditure increases
as income increases
200
PRL (Regression of Y on X)
Weekly consumption expenditure, $

E(Y|X)
150

100

50
60 80 100 120 140 160 180 200 220 240 260
Weekly income, $
Steps in analyzing the relationship

The population regression line (PRL)


= the curve connecting the means of subpopulations of Y
corresponding to the given values of X

200
PRL (Regression of Y on X)
Weekly consumption expenditure, $

E(Y|X)
150

Symmetrical distribution
100 of Y values around their mean

Subpopulation of Y
50
60 80 100 120 140 160 180 200 220 240 260
Weekly income, $
The population regression function

E(Y|X) = f(Xi) PRF = The conditional expectation function

 Ex. Consumption expenditure is linearly related to income

200

E(Y|X)= β1 + β2Xi

Weekly consumption expenditure, $


150
β1 = intercept coefficient
β2 = slope coefficient
100

50
60 80 100 120 140 160 180 200 220 240 260

 Linearity in the variables (Xi) Weekly income, $

 Linearity in the parameters (β1 , β2)


The population regression function

 Other linear functions: Y

Y Exponential
Y=eβ1 + β2X
X

Quadratic Y
Y=β1 + β2X + β3X2
X

Cubic
Y=β1 + β2X + β3X2 + β4X3
X
Stochastic specification of PRF

 What can we say about the relationship btw. an individual


family’s consumption expenditure and its given level of income?

 Given the income level of Xi, 200

an individual family’s consumption

Weekly consumption expenditure, $


expenditure is clustered around 150

the average consumption of all ui


families at that Xi 100

 Can express the deviation of Yi:


50
60 80 100 120 140 160 180 200 220 240 260
ui = Yi - E(Y|Xi) Weekly income, $

ui = the stochastic disturbance


= the stochastic error term
- unobservable random variable (positive/negative)
Stochastic specification of PRF

ui = Yi - E(Y|Xi) Yi = E(Y|Xi) + ui = β1 + β2Xi + ui

 The consumption expenditure of a family can be expressed as


the sum of two components:

1. a systematic/ deterministic component - E(Y|Xi)


- the mean consumption expenditure of all families
with the same level of income

2. a nonsystematic/ random component - ui


- the disturbance / error term
- shows that there are other variables besides
income that affect the consumption expenditure
The stochastic disturbance term

Why not introduce all variables into the model explicitly?


• Vagueness of theory
- incomplete theory
• Unavailability of data
- incomplete information
• Some variables are difficult to quantify
- ex. Education, gender, religion
• Intrinsic randomness in human behavior
- it will always be some “intrinsic” randomness
• Poor proxy variables
- errors of measurement
• Principle of parsimony
- keep the regression as simple as possible
• Wrong functional form
Random samples from the population

Income, $

X 80 100 120 140 160 180 200 220 240 260

55 65 79 80 102 110 120 135 137 150

60 70 84 93 107 115 136 137 145 152

Consumption Y 65 74 90 95 110 120 140 140 155 175


expenditure, $
70 80 94 103 116 130 144 152 165 178

75 85 98 108 118 135 145 157 175 180

88 113 125 140 160 189 185

115 162 191

Total 325 462 445 707 678 750 685 1043 966 1211

• 1st random sample


The sample regression function (SRF)

 Can we estimate PRF from the sample data?


 It will not be accurate bcs. of sampling fluctuations !

200
Weekly consumption expenditure, $

150

100

SRF1 (Regression based


on 1st sample)
50
60 80 100 120 140 160 180 200 220 240 260
Weekly income, $
Random samples from the population

Income, $

X 80 100 120 140 160 180 200 220 240 260

55 65 79 80 102 110 120 135 137 150

60 70 84 93 107 115 136 137 145 152

Consumption Y 65 74 90 95 110 120 140 140 155 175


expenditure, $
70 80 94 103 116 130 144 152 165 178

75 85 98 108 118 135 145 157 175 180

88 113 125 140 160 189 185

115 162 191

Total 325 462 445 707 678 750 685 1043 966 1211

• 1st random sample


• 2nd random sample
The sample regression function (SRF)

 Which of the two lines represents the “true” PRL?


 It is not sure which one

200
SRF2 (Regression based on 2nd sample)
Weekly consumption expenditure, $

150

100

SRF1 (Regression based


on 1st sample)
50
60 80 100 120 140 160 180 200 220 240 260
Weekly income, $
The sample regression function (SRF)

PRF: E[Y X i ]  1   2 X i

SRF: Yˆi  ˆ1  ˆ2 X i where: Ŷi = estimator of E[Y X i ]


ˆ , ˆ = estimators of  , 
1 2 1 2

Stochastic form: Yi  ˆ1  ˆ2 X i  uˆi where: ûi = residual term


(estimator of ui )

Objective: - to estimate PRF


Yi  1   2 X i  ui

on the basis of SRF


Yi  ˆ1  ˆ2 X i  uˆi
The sample regression function (SRF)

 The estimate is an approximation


- In terms of SRF: Yi = Ŷi + ûi
- In terms of PRF: Yi = E(Y|Xi) + ui

200 Yi
ûi
ui Ŷi
Weekly consumption expenditure, $

150
SRF : Yˆi  ˆ1  ˆ 2 X i
E[Y|Xi]

PRF : E[Y X i ]  1   2 X i
100

50
60 80 100 120 140 160 180 200 220 240 260
Weekly income, $
Next class:

 The problem of estimation


 How to construct SRF such that the estimator is as
“close” as possible to the true value

You might also like