Professional Documents
Culture Documents
Econometrics: Department of Marketing Faculty of Business Studies University of Dhaka
Econometrics: Department of Marketing Faculty of Business Studies University of Dhaka
Econometrics: Department of Marketing Faculty of Business Studies University of Dhaka
Code : 412
Course Teacher : Haripada Bhattacharjee
Professor
Department of Marketing
Course Outline
Econometrics ...................................................................................................................................................................................................... 1
Test of Hypothesis:............................................................................................................................................................................................ 2
Multi collinearity............................................................................................................................................................................................ 17
Heteroscedasticity ......................................................................................................................................................................................... 19
i|Page
Department of Marketing, University of Dhaka
Econometrics
Econometrics in the study of economic data with mathematical and statistical analysis.
𝐸𝑐𝑜𝑛𝑜𝑚𝑖𝑐𝑠 + 𝑀𝑎𝑡ℎ = 𝑀𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑎𝑙 𝐸𝑐𝑜𝑛𝑜𝑚𝑖𝑐𝑠
𝑀𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑎𝑙 𝐸𝑐𝑜𝑛𝑜𝑚𝑖𝑐𝑠 + 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 = 𝐸𝑐𝑜𝑛𝑜𝑚𝑒𝑡𝑟𝑖𝑐𝑠
Methodology of Econometrics
There are 08 steps Econometrics Methodology:
1. Developing Economic Theory: This is a set of variable rules which influence the accurate prediction of
economic analysis. Any irrelevant variable must not be included in economic theory.
2. Developing Mathematical Model for Economic Theory: Model may be linear, non-linear, log- linear.
𝑦 = 𝑎 + 𝑏𝑥
[When in any equation, the highest number of independent variables is one, it will be linear.]
[When the unknown variable contains highest power of 01, it is liner equation. If the unknown variable
contains highest power of 02, it is quadratic function. And it has to be converted into logarithm function.]
3. Developing Econometric Model of The Theory: The variables which are not added through the
mathematical model, but have influence on the depended variable shall be denoted by the symbol, µ, which
means error terms, residual factors, stochastic factors. In economic model, this error term is developed
based upon certain assumption.
𝑌 = 𝑎 + 𝑏𝑥 + µ/𝑒/𝑅/𝑆
4. Collection of Data: In most cases, econometrics depends on secondary data. Economic data must be two
types, primary and secondary data. However, before using any kind of data, quality of the data should be
assessed in terms of its reliability and the purpose of the data.
5. Estimation of the economic model.
6. Hypothesis Testing:
a) Null hypothesis
b) Alternative Hypothesis
In econometric modelling, researchers will always test the Null Hypothesis, through various probability
density functions. There are four techniques of testing the Null Hypothesis:
a) T-test (Where data is ≤ 30)
b) Z-test (Where data is > 30)
c) F-test (Where there are dependable variables)
d) Ci-square (Difference between the expectations)
7. Forecasting or prediction of econometric model for policy purpose.
1|Page
Department of Marketing, University of Dhaka
Single Equation Regression Model
1. Explain Variable (1) Independent Variable
2. Predictand Variable (2) Predictor Variable
3. Regressand Variable (3) Regressor Variable
4. Response Variable (4) Stimulus Variable
5. Endogenous Variable (5) Exogenous Variable
6. Outcome Variable (6) Co-variant Variable
7. Control Variable (7) Control Variable
Test of Hypothesis:
Procedure for Testing the Hypothesis.
Null Hypo (Ho) means a statement in which no changes in difference or effects is expected.
One-tailed test is a test of (Ho) where alternative Hypothesis (H1) is expressed directionally.
On the other hand, two-tailed test of Null Hypothesis (Ho) is not expressed directionally. Which has
symbol = or ≠
Accepting a false statement as true, which is not in fact true is called type I error. Accepting a true
statement which is in fact false is type II error.
It is the percentage at which (Ho) is rejected or accepted, usually measured 0.05 or 95%
It is the value which is derived from Standard Normal Distribution from Z-table is called critical value.
It is also known as p-value (probability value).
2|Page
Department of Marketing, University of Dhaka
Step 7 : Identification of Degrees of Freedom
It is the number of out layer (Abnormal Data) which can be deleted by the researcher from the entire
series, varies the test statistic which will be applied.
There are four types of test statistic which are frequently used in testing (Ho)
T- Test
Z- test
F – Test
Chi-square
Chi-square will be used when the data is none-metric and the remaining three tests will follow for
metric data.
Step 9 :
Comparison of Calculated result with that of table value. If the calculated value is greater than table
value then we can reject the (Ho).
Non-negativity: The decision variables are permitted to have only the values greater than or equal to zero.
Additivity: The total output is the algebraic sum of each individual output
𝑌 = 𝑎 + 𝑏𝑥 + 𝑒 (error terms/disturbance/stochastic)
Independent Variable
Parameter means coefficient of independent variable and the highest power of the parameter is ‘1’. If
the equation is non-linear then it has to be converted into linear equation by taking logarithm in the
both site of the equation.
3|Page
Department of Marketing, University of Dhaka
2. The residual variable which is denoted by ‘e’ have 0 mean value. That is expected value of ‘e’ must
always be equal to 0 (zero).
3. The variance of the error term is same regardless of the variance of ‘x’.
Standard deviation vs Variance
5 , 7 , 9 , 11 , 31 Outlier
4|Page
Department of Marketing, University of Dhaka
Two Variable Regression ( Single Equation )
X Y
𝑥= 𝑥 𝑦=𝑌 𝑌 𝑦 𝑥𝑦 = + 3.6 𝑒=𝑌 𝑌̂ 𝑒 𝑌̂ 𝑌
Ads Sales
10 45 0 0 5 25 0 40.0 5.0 25 0
9 40 -1 1 0 0 0 36.4 3.6 13 12.96
11 42 1 1 2 4 2 43.6 -1.6 3 12.96
12 45 2 4 5 25 10 47.2 -2.2 5 51.84
8 28 -2 4 -12 144 24 32.8 -4.8 23 51.84
∑X 50 ∑Y 200 ∑𝑥 ∑𝑦 ∑𝑥𝑦 ∑𝑒
= 10 𝑌= 40 10 198 36 68.40
Question:
Answers:
̂0 = 𝑌
𝛽 𝛽1 ̅ = 0 3.6 10 =
So, the estimated regression equation is,
̂= + 3.6
If no advertisement is given, even then sales will be BDT 4. For additional BDT 01 advertisement, the
additional sales will be BDT 3.6
2. If BDT 15 is spent for advertisement, estimated sales will be,
̂= + 3.6
= 3.6 15
= 58
If BDT 15 is spent for ad, then sales will be BDT 58, that can be explained by 65%. That means advertisement
influences the sales by 65%.
5|Page
Department of Marketing, University of Dhaka
The relationship between sales and advertisement can be calculated by the co-efficient of correlation denoted
by, √ 𝑟= 𝑟
= √0.65
= 0.81
The value of r is equal to 0.81 that indicates that there is a strong positive correlation between sales and
advertisement.
4. Test of significant of the parameter (β1) can be tested by using this formula,
∑𝑒
𝛽̂ 1 = √
𝑛 𝑘 ∑𝑥
68. 0
=√
5 10
= 1.51
To test the significant of this parameter, we have to calculate t-ratio,
̂1
𝛽
𝑇=
𝑆𝐷 𝛽̂ 1
3.6
=
1.51
= .38
5. If the calculated t-ratio is above the 2.5, we can say parameter is significant. But in this calculation,
calculated t-ratio is less than 2.5. which indicates, parameter is not significant. That is additional BDT 01
advertisement, sales will be 3.6 is not significant.
6|Page
Department of Marketing, University of Dhaka
Multiple Regression Analysis (Simultaneous Equation Method)
A 6 8 5 64 25 48 30 40
B 12 11 2 121 4 132 24 22
C 10 9 1 81 1 90 10 9
D 7 6 3 36 9 42 21 18
E 3 6 4 36 16 18 12 24
∑ 𝑆 38 ∑ 𝑌 40 ∑ 𝑁 15 ∑ 𝑌 ∑𝑁 ∑ 𝑆𝑌 ∑ 𝑆𝑁 ∑ 𝑁𝑌
𝑆 7.6 𝑌 ̅
8 𝑁 3 338 55 330 97 113
Question:
Solution:
𝑆 = 𝛽0 + 𝛽1 𝑌 + 𝛽 𝑁 + 𝑒
𝑆̂ = 𝛽̂0 + 𝛽̂1 𝑌 + 𝛽̂ 𝑁
Step 3: Applying least square method, the three normal equations are,
∑𝑆 = 𝛽̂0 𝑛 + 𝛽̂1 ∑𝑌 + 𝛽̂ ∑𝑁 … … . … … … … … … … 𝑖
38 = 5 𝛽̂0 + 0 𝛽̂1 + 15 𝛽̂
Step 5: Multiplying the equation, no. (i) by 8 and deducting from the equation no. (ii) we have,
30 = 0 𝛽̂0 + 3 0 𝛽̂1 + 1 0 𝛽̂
6 = 18 𝛽̂1 07 𝛽̂
18 𝛽̂1 07 𝛽̂ = 6 … … … . . 𝑖𝑣
7|Page
Department of Marketing, University of Dhaka
Step 6: Again, Multiplying the equation, no. (i) by 3 and deducting from the equation no. (iii) we have,
11 = 15 𝛽̂0 + 1 0 𝛽̂1 + 5 𝛽̂
17 = 07 𝛽̂1 + 10 𝛽̂
07 𝛽̂1 + 10 𝛽̂ = 17 … … … … . 𝑣
Step 7: Step 8:
Now, multiplying the equation, no. (iv) by 10 and Putting 𝛽̂1 = 1.076 in the equation no. (v) we get,
equation, no. (v) by 07 and then deducting equation, no.
(v) from the equation no. (iv) we have, 0.7 × 1.076 10 𝛽̂ = 17
180 𝛽̂1 70 𝛽̂ = 60
⇒ 10 𝛽̂ = 7.53 17
9 𝛽̂1 + 70 𝛽̂ = 119
−9.468
⇒ 𝛽̂ = 10
131 𝛽̂1 = 1 1
1 1 ∴ 𝛽̂ = 0.9 7
𝛽̂1 =
131
∴ 𝛽̂1 = 1.076
𝑆 = 𝛽̂0 + 𝛽̂1 𝑌 + 𝛽̂ 𝑁
̅
∴ 𝛽̂0 = 1.833
Illustration:
8|Page
Department of Marketing, University of Dhaka
Multiple Regression Analysis (Three Dimensions Method)
Y X1 X2
𝑦=𝑌 𝑌 𝑦 𝑥1 = 1 1 𝑥1 𝑥 = 𝑥 𝑥1𝑦 𝑥 𝑦 𝑥 1𝑥
Qnty. Price Y(Income)
100 5 1000 20 400 -1 1 200 40,000 -20 4,000 -200
75 7 600 -5 25 1 1 -200 40,000 -5 1,000 -200
80 6 1200 0 0 0 0 400 160,000 0 0 0
70 6 500 -10 100 0 0 -300 90,000 0 3,000 0
50 8 300 -30 900 2 4 -500 250,000 -60 15,000 -1000
65 7 400 -15 225 1 1 -400 160,000 -15 6,000 -400
90 5 1300 10 100 -1 1 500 250,000 -10 5,000 -500
100 4 1100 20 400 -2 4 300 90,000 -40 6,000 -600
110 3 1300 30 900 -3 9 500 250,000 -90 15,000 -1500
60 9 300 -20 400 3 9 -500 250,000 -60 10,000 -1500
∑ 800 ∑ 60 ∑ 8000 ∑𝑦 ∑𝑥 1 ∑𝑥 ∑𝑥 1𝑦 ∑𝑥 𝑦 ∑𝑥 1𝑥
𝑌= 80 1= 6 = 800 3450 30 1,580,000 -300 65,000 -5,900
Questions:
Solution
9,05,00,000 = 7.188
=
1, 5,90,000
1,80,000
= = 0.01 3
1, 5,90,000
̂0 =
𝛽 ̂1 1
𝛽 ̂
𝛽 = 80 7.188 6 0.01 3 800
And,
= 80 + 3.1 8 11. = 111.688
9|Page
Department of Marketing, University of Dhaka
𝛽1 × ∑ 𝑥1 𝑦 𝛽 ×∑ 𝑥 𝑦
2. Coefficient Of Determination, 𝑅 =
∑ 𝑦
7.188 × 300 + 0.01 3 × 65,000
=
3, 50
= 0.89
That means, the output of Quantity depends 89.4% on price and income.
3. The correlation-coefficient 𝑅 = √𝑅
= √0.89
= 0.9 5
That means, there is a strong positive correlation because the value is closer to 1.
4. Adjusted R-square ( R ) measures what would have been happen if more independent variables are added.
Always the value of R-square ( R ) will be less than the original R2 value. And the formula is –
𝑛 1
Adjusted R square, 𝑅 =1 1 𝑅 ( )
=1 1 0.89
𝑛
(
𝑘
10 1
10 3
) [ 𝑛 = Number of Data
𝑘 = Number of Variable ]
= 0.863
That means, if more independent variable would have been added, even that coefficient of R
determination is closer to original R2, which indicates independent variable are rightly identified.
365.7
= 3 50 1 0.89 = 10−3
= 365.7 =5 .
7. Variance of 𝛽̂ 1, Variance of 𝛽̂ 2 ,
∑ 𝑥2 ∑ 𝑥2
Var (𝛽̂ 1) = 𝜎 𝑒 ∑ 𝑥 2 ×∑ 𝑥 2 −2∑ 𝑥 2
Var (𝛽̂ 2) = 𝜎 𝑒 ∑ 𝑥 2 ×∑ 𝑥 2 −1∑ 𝑥 2
1 2 1 𝑥2 1 2 1 𝑥2
15,80,000 30
= 5 . { 30 × 15,80,000 − −5900 2 } = 5 . { 30 × 15,80,000 − −5900 2 }
= 6.56 = 0.0001
= √6.56 = √0.0001
= .56 = 0.01
10 | P a g e
Department of Marketing, University of Dhaka
9. T-ratio of 𝛽̂1 𝑎𝑛𝑑 𝛽̂ ,
𝛽 𝛽
𝑡 𝑟𝑎𝑡𝑖𝑜, (𝛽̂1 ) = 𝑆𝐸 𝑜𝑓1 𝛽 𝑡 𝑟𝑎𝑡𝑖𝑜, (𝛽̂1 ) = 𝑆𝐸 𝑜𝑓1 𝛽
1 1
−7.188 0.0143
= =
.56 0.01
= .806 = 1. 3
𝑆𝐸 2.56 0.01
𝑅 0.894
𝑅 0.863
Significant level at 0.05. [If T-ratio is greater than 3.55, we will consider significant level at 0.01]
11 | P a g e
Department of Marketing, University of Dhaka
Cobb-Douglas Production Function
Y (Qty.) X 2 (Labour) X 3 (Capital) Y X2 X3
𝑦=𝑌 𝑌 𝑦 𝑥 = 𝑥 𝑥3 = 𝑥3 𝑥 𝑦 𝑥3𝑦 𝑥 𝑥3
Q L K 𝑙𝑜 𝑒𝑄 𝑙𝑜 𝑒𝐿 𝑙𝑜 𝑒𝐾 2 3 3
100 1.0 2.0 4.605 0.000 0.693 -0.444 0.197 -0.755 0.571 -0.215 0.046 0.3350.095 0.162
120 1.3 2.2 4.787 0.262 0.788 -0.261 0.068 -0.493 0.243 -0.120 0.014 0.1290.031 0.059
140 1.8 2.3 4.942 0.588 0.833 -0.107 0.011 -0.168 0.028 -0.075 0.006 0.0180.008 0.013
150 2.0 1.5 5.011 0.693 0.405 -0.038 0.001 -0.062 0.004 -0.503 0.253 0.0020.019 0.031
165 2.5 2.8 5.106 0.916 1.030 0.057 0.003 0.161 0.026 0.121 0.015 0.0090.007 0.020
190 3.0 3.0 5.247 1.099 1.099 0.198 0.039 0.343 0.118 0.190 0.036 0.0680.038 0.065
200 3.0 3.3 5.298 1.099 1.194 0.250 0.062 0.343 0.118 0.286 0.082 0.0860.071 0.098
220 4.0 3.4 5.394 1.386 1.224 0.345 0.119 0.631 0.398 0.316 0.100 0.2180.109 0.199
∑ 40.39 ∑ 6.04 ∑ 7.27 ∑ ∑𝑦 ∑ ∑𝑥 ∑𝑥3 ∑𝑥 𝑦 ∑ 𝑥3𝑦 ∑ 𝑥 𝑥3
𝑌 5.049 2 0.755 3 0.908 0.000 0.502 0.000 1.505 0.000 0.551 0.865 0.379 0.647
L = Labour
K = Capital
Now,
∑ 𝑥 𝑦 × ∑ 𝑥3 ∑ 𝑥3 𝑦 × ∑ 𝑥 𝑥3 ∑ 𝑥3 𝑦 × ∑ 𝑥 ∑ 𝑥 𝑦 × ∑ 𝑥 𝑥3
𝛼= 𝛽=
∑ 𝑥 × ∑ 𝑥3 ∑ 𝑥 𝑥3 ∑(𝑥 ) × ∑ 𝑥3 ∑ 𝑥 𝑥3
= 0.5638 = 0.0 9
𝐴=̅ 𝛼̅ 𝛽 ̅3
= .60
So,
log 𝑒 𝑄 = log 𝑒 𝐴 + 𝛼 log 𝑒 𝐿 + 𝛽 log 𝑒 𝐾
MES
constant
12 | P a g e
Department of Marketing, University of Dhaka
Here, the values of α and β are,
α = 0.5638
β = 0.0249
𝛼 + 𝛽 = 0.5638 + 0.0 9
= 0.5887
As, 𝛼 + 𝛽 < 1, it is called decreasing return to seale. And per unit production cost will be decreased if
resources are ineseared.
13 | P a g e
Department of Marketing, University of Dhaka
Autocorrelation / Serial Correlation
2. X values are fixed in repeated sampling. More specifically, X is assumed to be non-stochastic (conditional
regression).
4. Equal variance of µi . It means that the Y populations corresponding to various X values have the same
variance. (Homoscedasticity).
5. No autocorrelation between the disturbance. This means disturbance Ui and Uj are uncorrelated. This is the
assumption of no serial correlation, or no autocorrelation.
A commonly used statistic to detect auto correlation is the Durbin-Watson DW statistic, which is denoted by
‘d'. It is defined as:
∑ 𝑒𝑡 −𝑒𝑡−1 2
𝑑𝑥 = ∑ 𝑒𝑡2
e/µ = disturbance term.
∑𝑛 ̂ 𝑡−1 2
̂ 𝑡 −𝜇
1 𝜇
Or, 𝑑𝑥 = 𝑛
∑1 𝜇 ̂ 𝑡2
Decision Rule
1. If 'd' is found to be 2, one may assume that there is no first-order autocorrelation, either positive or
negative.
2. If 𝑃̂ = +1 , indicating perfect positive correlation in the residual, 𝑑 = 0. Therefore, the closer d is to 0,
the greater the evidence of positive Serial Correlation
14 | P a g e
Department of Marketing, University of Dhaka
3.
Example:
Consider the following model; 𝑌 = 𝛼 + 𝛽 𝑡 + 𝜇𝑡 . test the autocorrelation with following observations on Y
and X:
X : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Y : 2 2 1 1 3 5 6 6 10 10 10 12 15 10 11
60. 13
dx = = 1.
1.767
Value of dL and du on 5% level of significance, with n=15 and one explanatory variable, are:
(2) Compare calculated dx valve with table value of d, with (n-k) df K being the no. of explanatory
variables including constant term.
(3) Two values; upper bound du and lower bound dL have been assigned to d.
(4) Values of d lie between 0 and 4 and if there is no autocorrelation then dx = 2 (we accept Ho) and if dx
is close to 0 or 4, we reject H0.
(5) The exact value of d is never known, but exist a range of Values within which we can neither accept
nor reject Ho.
15 | P a g e
Department of Marketing, University of Dhaka
16 | P a g e
Department of Marketing, University of Dhaka
Multi collinearity
(c) The higher the observed Chi-sq. than the table value, the more severe the multicollinearity.
3. 𝑡𝑒𝑠𝑡 examines the severity of multicollinearity. F-test locates which variables are multicollinear.
And t-test detects the pattern of multicollinearity (variables which cause multicollinearity).
4. To detect the multi collinearity, the most popular method is the Frisch's Confluence Analysis.
Example:
Year C Y L Pc Po
2001 8 82 17 91 94
2002 9 88 21 93 96
2003 10 99 25 96 97
2004 11 105 29 94 97
2005 12 117 34 100 100
2006 14 131 40 101 101
2007 15 148 44 105 104
2008 17 161 49 112 109
2009 19 174 51 112 111
2010 20 184 53 112 111
∴ 𝐶 = 𝑏𝑜 + 𝑏1 𝑌 + 𝑏 𝐿 + 𝑏3 𝑃𝑐 + 𝑏4 𝑃𝑜 + 𝑒
Table value of F with k-1 and n-k d.f at 5% SL is 5.19. Hence, we reject the Ho; that is there is
multicollinearity.
17 | P a g e
Department of Marketing, University of Dhaka
All the explanatory variables are seriously multicollinear when we compute simple correlation coefficients:
rYL = 0.993
rYPc = 0.980
rYPo = 0.987
rLPc = 0.964
rLPo = 0.973
rPcPo = 0.991
To explore the effects of multicollinearity, we compute elementary regression gradually into the function:
b̂ 1 b̂ 2 b̂ 3 b̂ 4
bo (Y) Pc L Po R d
𝐶=𝑓 𝑌 -1.24 0.118 - - - 0.995 2.6
(0.37) (0.002)
Dropping L and introducing Po , we obtain a better overall fit. R2 is slightly increased. L is clearly a superfluous
variable. Thus, best fit is obtained from the function 𝐶 = 𝑓 𝑌, 𝑃𝑐 , 𝑃𝑜 .
18 | P a g e
Department of Marketing, University of Dhaka
Heteroscedasticity
1. One assumption we have made until now is that the errors ‘µi’ in the regression equation have a common
variance 𝜎 2.
2. Other three assumptions about errors are:
3. 'U' is introduced into model to take into account the various influence of ‘errors’, such as errors of omitted
variable, errors of the mathematical form of the model, errors of the measurement of the dependent
variable, and erratic behavior of human.
4. Randomness of 'µ' means, omitted variables should follow a Systematic pattern.
5. Zero mean of 'U' is the conceptual application of the rules of algebra and can never be proved.
Geometrically, zero mean of 'U' implies that the observations of Y and X must be scattered around the line
in random way.
6. Normality assumption of ‘U' says that probability distribution of random 'µ' remains the same over all
observations of X, and in particular that variance of each 'U' is the same for all values of the explanatory
variable.
7. Heteroscedasticity can be detected by following tests.
8. Amongst the above tests, widely used ones is the Ramsey (RESET) test.
9. The test involves regressing ‘µi’ on x2,x3 and so on/or regressing ‘µi’ on ŷ2, ŷ3 to test whether or not the
coefficients are significant.
10. There are three solutions for heteroscedasticity problem:
19 | P a g e
Department of Marketing, University of Dhaka
Detection of Heteroscedasticity:
1. The easiest way to detect the presence is to plot the graph of residual against the dependent variable and
then examine the pattern of residuals
2. Spearman rank correlation coefficient can be used to test the presence of heteroscedasticity. In this case,
OLS is applied to estimate equation,
𝑌 = 𝛼 + 𝛽𝑥 + 𝜇
then estimated error term and X are arranged and rank correlation coefficient is computed by using the
formula,
6 ∑ 𝐷𝑡
𝑟𝑒 .𝑥 = 1
𝑛 𝑛 1
And if coefficient is high it means there exists a problem of heteroscedasticity.
3. Park suggested two-stage procedure to test heteroscedasticity:
i. Run OLS regression and obtain e1 from the regression
ii. Run log-linear regression between et2 and X and examine whether β is significant (a significant β
suggests heteroscedasticity).
4. Goldfield - Quandt test involves estimating two least square lines, one using data with low variance and
other using data high variance errors. If the variances are approximately equal, there is no
heteroscedasticity.
20 | P a g e
Department of Marketing, University of Dhaka