Professional Documents
Culture Documents
Cursus Advanced Econometrics
Cursus Advanced Econometrics
Cursus Advanced Econometrics
Christophe Croux
christophe.croux@econ.kuleuven.be
1
Contents
1 Basic Econometrics 4
1.1 Basic Statistical Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Why Using Statistics? . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Estimators of means and variances . . . . . . . . . . . . . . . . . . 10
1.1.4 Joint and Conditional Distributions . . . . . . . . . . . . . . . . . . 12
1.1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Econometrics in Practice: Introduction to the Use of Eviews . . . . . . . . 16
1.2.1 Hands on with Eviews . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.2 Basic Principles of Eviews . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.3 Descriptive Univariate Statistics . . . . . . . . . . . . . . . . . . . . 19
1.2.4 Correlation Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2.5 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3 The Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.3.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.3.2 The Ordinary Least Squares Estimator . . . . . . . . . . . . . . . . 27
1.3.3 Properties of OLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.3.4 Summary Statistics in Regression Analysis . . . . . . . . . . . . . . 29
1.3.5 Residual Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.3.6 Using Dummy Variables . . . . . . . . . . . . . . . . . . . . . . . . 32
1.3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.4 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.4.1 Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.4.2 Coefficient Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.4.3 Omitted and Redundant Variables . . . . . . . . . . . . . . . . . . 37
1.4.4 Residual Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.4.5 Consistent Estimation of Covariance . . . . . . . . . . . . . . . . . 39
3 Non-linear Regression 44
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Least Squares Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.1 How to Compute θ̂ . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.2 Asymptotic Distribution . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.3 Example (Green, 1993) . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.4 The Delta-Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3 Homework: Non-Linear Regression . . . . . . . . . . . . . . . . . . . . . . 51
3.4 Homework: Some Theoretical Exercises . . . . . . . . . . . . . . . . . . . . 53
2
4 Non-parametric regression 54
4.1 Ingredients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.1 1st Ingredient: Polynomial Fit . . . . . . . . . . . . . . . . . . . . . 54
4.1.2 2nd ingredient: Weighted OLS . . . . . . . . . . . . . . . . . . . . . 55
4.2 Local fit around x0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Non-parametric regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3
1 Basic Econometrics
Possible specifications:
4
3. Non linear forms, e.g. log(M ) = α + β(log(Y ))δ + γr, with non-constant elasticities.
After specification of the form, we still need to estimate the unknown parameters (e.g.
by Least Squares or Maximum Likelihood). We could be interested in testing whether P
has a significant effect on M or not. According to theory, it should not. Specify:
M = α + βY + γr + δP
1. measurement errors
3. unpredictable behavior/events
5. etc.
In statistics and econometrics we deal with variables whose values are (partly) determined
by some chance mechanism. Such variables are called stochastic or random variables. In
principle, we denote stochastic variables with X, Y, Z, ... and their outcomes/realizations
by x, y, z, . . .. Although we cannot predict which values a random variable will take,
we are often able to attach certain probabilities to these values, that is, to derive its
(probability) distribution.
5
Discrete stochastic variables X takes values in a finite set {0, 1, 2, . . . , N } or in
0, 1, 2, 3, . . .. Examples can be to pass or fail an exam, the number of matches won in a
championship, the number of car accidents of a person during one year, etc.
pk = P (X = k) for k = 0, 1, 2, . . .
It is also called the (population) mean of X. Similarly, the expected value of a Z = g(X)
is defined as
X
E[Z] = g(k)pk .
k
k 0 1 2 3
pk 0.125 0.375 0.375 0.125
Example 2: Let X be the number of children in a family. Here we do not know the
exact probabilities and we need to estimate them. We therefore collect a random sample
of n = 60 families and record the number of children in each family. The results are
pictured in the barplot below:
How could you obtain an estimate for the expected value of X?
6
20
15
frequency
10
5
0
0 1 2 3 4 5 6
number of children
7
It is also called the (population) mean of X. The expected value of Z = g(X) is defined
as Z
E[Z] = g(x)f (x)dx.
Note that the standard deviation has the same units of measurement as X.
8
Example: The standard normal distribution with its density function
√
f (x) = exp(−x2 /2)/ 2π,
0.1
0.0
-4 -2 0 2 4
0.4
0.2
0.0
-4 -2 0 2 4
9
For X a standard normal random variable, we have E[X] = 0 and SD(X) = 1. Further-
more: P (−1 ≤ X ≤ 1) ≈ 0.68, P (−2 ≤ X ≤ 2) ≈ 0.95, P (−3 ≤ X ≤ 3) ≈ 0.997.
contains 95% of the possible outcomes of X. It is called the 2σ-interval. For example,
if the average IQ of the Belgian population is 100 with SD=15, then about 95% of the
Belgians have an IQ in the interval [70;130], under the assumption of normality.
The (population) mean µ = E[X] and standard deviation σ = SD(X) are (almost always)
unknown. We call them population parameters. We can only compute them exactly when
the probability distribution is completely known, which is rarely the case. However, it
is possible to estimate them. If we have a random sample X1 , . . . , Xn then the sample
average is
n
1X
µ̂ = Xi ,
n i=1
and the sample standard deviation
v
u n
u1 X
σ̂ = t (Xi − µ̂)2 .
n i=1
When estimating µ by µ̂ (or σ by σ̂) we will make an estimation error. This error
is quantified by the standard error (SE). It is the magnitude of a “typical” error. An
(approximative) rule to construct a confidence interval for an estimated parameter is
10
given by
[estimator − 2 ∗ SE, estimator + 2 ∗ SE].
This rule is valid for “most” estimators and based on the fact that the distribution of
“most” estimators is close to a normal distribution for large sample sizes (Central Limit
Theorem).
Example: Let X be the monthly income of a factory worker (in euro). We want to
estimate the population mean E[X]. We take the income for a random sample of n = 300
workers, given in the following histogram,
60
50
40
frequency
30
20
10
0
monthly income
The sample average equals 1201, with standard error SE=6.47. The sample standard
deviation is 112. We conclude that a 95% confidence interval for the population mean is
given by [1189;1215]. Moreover, since the distribution of X seems to be close to a nor-
mal distribution, we may say that about 95% of the monthly incomes are in the interval
[978;1427].
E[aX + b] = aE[X] + b
and
SD(aX + b) = |a|SD(X).
11
1.1.4 Joint and Conditional Distributions
Let X and Y be two stochastic variables which possibly interact. To simplify the formu-
las, we consider them as discrete.
P (X = k, Y = l).
P (Y = l and X = k)
P (Y = l|X = k) = ,
P (X = k)
The marginal distributions of X and Y are nothing less than the distributions of X
and Y separately.
Definition: When the conditional distribution of Y given X equals the marginal distribu-
tion of Y , then X and Y are statistically independent.
Properties:
- E[X+Y]=E[X]+E[Y]
Example: Let X be the gender of the first child (0=girl, 1=boy) and Y the gender of the
second child of a family. The joint distribution of (X, Y ) is,
12
Marginal distribution of Y :
P (Y = 0) = 0.5 and P (Y = 1) = 0.5.
Conditional distribution of Y given X = 1:
P (Y = 1|X = 1) = 0.5 and P (Y = 0|X = 1) = 0.5.
Conditional distribution of Y given X = 0:
P (Y = 1|X = 0) = 0.5 and P (Y = 0|X = 0) = 0.5.
We see that X and Y are statistically independent.
1.1.5 Exercises
1530
1520
1510
2. Let X be the total amount of money (cash) that a family keeps at home. From a
sample of size n = 100 we obtain a sample mean of 800 euro with standard error 50
euro. The sample standard deviation is given by 500 euro.
(c) Construct an interval that will contain approximately 95% of the outcomes of
X.
13
(d) Do you think that the distribution of X is normal?
3. Let X be the return of a stock A, and suppose that it follows a distribution N(2,1).
Let Y be the return of a stock B, and suppose that it follows a distribution N(3,2).
Suppose that X and Y are independent. I buy 10 stocks of A and 10 of B. Compute
the expected return and the standard deviation of the total return. How can I
increase the expected return? Comment on this.
4. The next 4 histograms are all based on 100 observations. Comment on their forms.
25
20
20
15
15
10
10
5
5
0
-1 0 1 2 3 -1 0 1 2 3
x1 x2
20
20
15
15
10
10
5
5
0
-2 0 2 4 -1 0 1 2 3 4
x3 x4
14
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.0
0.0
-2 -1 0 1 2 3 4 -1 0 1 2 3 4
x1 x2
0.4
0.15
0.3
0.10
0.2
0.05
0.1
0.0
0.0
-4 -2 0 2 4 6 -2 0 2 4
x3 x4
5. Let Y be the hourly wage of a person and X the number of years of schooling.
Compare the marginal distribution of Y with the conditional distribution of Y given
X = 16 and with the conditional distribution of Y given X = 12.
15
1.2 Econometrics in Practice: Introduction to the Use of Eviews
The following steps will give you a first impression of the power of Eviews:
1. We will analyse data in the file demo.xls. Let us first have a look at this file.
The variables or series have been put into columns and represent aggregate money
demand (M 1), income (GDP ), the price level (P R) and the short term interest
rates (RS). We see that the data are quarterly. Record the number of variables and
the time span. Close the .xls file and start Eviews.
3. Open the object “GDP” by clicking on it. Using the View menu, try out: (1)
/graph/line, (2) /descriptive statistics/histogram and stats/.
Note that
Xt − Xt−1
log(Xt ) − log(Xt−1 ) = log(Xt /Xt−1 ) ≈ .]
Xt−1
16
6. Select now in the workfile the series M1, GDP, PR, and RS. By clicking on the
right mouse button, you can open them as a group. Try /View/Graphs/Lines/ and
/View/Multiple Graph/Lines/. Compute the correlation matrix of these variables,
by /View/Correlations. Are these correlations spurious?
[Time series showing trends over time will always be highly correlated. The reason
is that they are both driven by time. The high correlation does not imply a causal
relationship, it may be spurious.]
Have a look at the output. Construct a confidence interval for δ. Do you think that
PR is a significant variable?
δ̂ − 0
T = .
SE(δ̂)
The P-value is the probability that the test statistic takes values more extreme than
the observed one (under H0 ).
17
P-value > α ⇒ do not reject H0 at significance level α
P-value < α ⇒ reject H0 at significance level α
The default choice for the significance level is α = 0.05. This level gives the type I
error, i.e. the probability of rejecting H0 when it holds. The smaller the choice of α,
the more conservative we are towards H0 . If P-value< 0.05, then the corresponding
variable is said to be significant (for explaining Y ). If P-value< 0.01, then it is
highly significant.
It is often better to interpret the P-value on a continuous scale (e.g. P=0.049 and
P=0.051 is almost identical). The smaller the P-value, the more evidence in the
data against the null hypothesis.
Some authors prefer to report only the t-stats. A variable is significant if the t-stat
is larger than 2 in absolute value.
9. There is a problem with the regression model estimated above. The error term
still contains too much “structure”. Save the residuals, which can be considered as
estimates of the error terms, by selecting /Procs/make residual series/ within the
equation window. Make then the correlogram of the residuals and comment.
where ∆ is the mathematical notation for the difference operator. In Eviews, you
can specify “dlog(m1) c dlog(gdp) d(rs) dlog(pr).”
(b) Test H0 : δ = 0.
18
1.2.2 Basic Principles of Eviews
- The available toolbars/menus of an object window depend on the type of the object.
19
The distribution of the data can be pictured by an histogram or a kernel density plot.
A quantile-quantile plot (QQ-plot) can be used as a visual check for normality of the
data. The points in the QQ-plot should closely follow a linear pattern, if the distribution
is normal.
A formal test of normality is the Jarque-Bera test, which is based on skewness and kur-
tosis. If the associated P-value is smaller than 5%, then we do reject the null hypothesis
of normality.
Before starting an econometrical analysis, it is important to screen the data for outliers.
Making boxplots is helpful.
Given two stochastic variables X and Y . The covariance between X and Y is defined as
Cov(X, Y )
Corr(X, Y ) = p .
V ar(X)V ar(Y )
We have that
* −1 ≤ Corr(X, Y ) ≤ 1
* Corr(X, Y ) = 1 (respectively = -1) if and only if there exist a > 0 (resp. a < 0)
and b such that Y = aX + b.
20
From a random sample X1 , . . . , Xn , we can estimate ρ = Corr(X, Y ) by the correlation
coefficient Pn
i=1 (xi
− x̄)(yi − ȳ)
ρ̂ = pPn Pn .
2 2
i=1 (xi − x̄) i=1 (yi − ȳ)
The correlation coefficient is used as a measure of the strength of the linear association
between 2 variables. It is telling us to which extent 2 variables “move together”, and has
nothing to say about causal relations.
21
4.0
5
4
3.0
3
y
y
2
2.0
1
1.0
-1
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0
x x
40
0
-2
20
y
y
-4
0
-6
-20
-8
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0
x x
0.8
2.5
0.6
1.5
y
y
0.4
0.5
0.2
-0.5
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0
x x
22
Serial Correlation If the data are a random sample X1 , X2 , . . . , Xn , then
Corr(Xi , Xj ) = 0 for 1 ≤ i 6= j ≤ n. But if the data form a time series, then presence of
autocorrelation or serial correlation may occur. If Xt is a stationary time series, then this
autocorrelation can be quantified.
The graph of the function k → ρ̂k is called the correlogram. A correlogram also indicates
critical bounds. If ρ̂k passes these bounds, it is significantly different from zero (H0 : ρk = 0
is rejected at 5%).
23
series 1 correlogram
1.0
6
4
0.6
ACF
2
0.2
0
-2
-0.2
0 20 40 60 80 100 0 5 10 15 20
Lag
Time
series 2 correlogram
1.0
2
1
0.6
0
ACF
-1
0.2
-2
-0.2
-3
0 20 40 60 80 100 0 5 10 15 20
Lag
Time
The correlograms above show that there is much more serial correlation in the first than
in the second series. One says that there is more persistency in the first series.
1.2.5 Exercise
In the file spurious.txt you find annual data from 1971 to 1990 for the variables,
We are interested in the correlations between these variables. Note that we have here
time series data and no cross-sectional data.
24
2. Open X and Y as a group (select them and use the right mouse button).
8. Create by using /Genr/ the 3 series in differences. (Use d(X), d(Y), d(Z)). Compute
now the correlation matrix. Comment on the results.
25
1.3 The Linear Regression Model
Y = α + β1 X1 + . . . + βp Xp + ²,
with α the intercept parameter and β1 , . . . , βp slope parameters. The disturbance term is
². We observe a sample {(Xi , Yi )|1 ≤ i ≤ n} of size n, and suppose that every observation
follows the model:
Yi = α + β1 Xi1 + . . . + βp Xip + ²i ,
H2. E(²i ) = 0.
Conditions H1 and H2 are crucial and always needed. The condition of uncorrelatedness
of the error terms (H3) is often violated for time series data. Condition (H4) says that
the error terms need to be homoscedastic and not heteroscedastic. Conditions H1 and H2
imply that the conditional mean function or regression function is given by
E[Y |X1 , . . . , Xp ] = α + β1 X1 + . . . βp Xp .
This is the function of main interest in linear regression: it allows to estimate conditional
expectations. Note that for any 1 ≤ j ≤ p
∂E[Y |X1 , . . . , Xp ]
= βj .
∂Xj
26
The interpretation of βj is as follows:
“If Xj changes with one unit, then Y changes with βj units, on average and all other
variables kept constant.”
“If Xj changes with one percentage, then Y changes with βj percentages, on average and
all other variables kept constant.”
Var[Y |X1 , . . . , Xp ] = σ 2 .
Let α̂ and β̂1 , . . . , β̂p be estimators of the regression parameters. The fitted values of the
dependent variable are then
The ordinary least squares (OLS) estimators of α and β1 , . . . , βp are such that
n
X
ri2
i=1
27
The parameter σ 2 is estimated by
n
1 X 2
σ̂ 2 = r ,
n − k i=1 i
• Expressions for the standard errors of α̂ and β̂ are known and valid under H1+H2+H3+H4.
If also H5 is true, then
If H5 is not true, then the OLS estimators are only asymptotically normal.
The default choice for the significance level is 0.05. If we reject H0 , then we say that
X is a significant variable. If P-value< 0.01, then X is a highly significant variable.
Recall that the P-value needs to be interpreted on a continuous scale. The smaller
the P-value, the more evidence in the data against the null hypothesis.
28
1.3.4 Summary Statistics in Regression Analysis
R-squared The R-squared statistic is the fraction of the variance of the dependent
variable explained by the independent variables:
V arŶ V ar(residuals)
R2 = =1− .
V arY V arY
We also call R2 the squared multiple correlation coefficient. Do not use R2 to compare
models with different dependent variables.
n−1
AdjustedR2 = 1 − (1 − R2 ) .
n−k
F-statistic The F-statistic tests the hypothesis that all of the slope coefficients (exclud-
ing the intercept) in a regression are zero,
H0 : β1 = . . . = βp = 0.
An accompanying P-value is given by the software. The F-test is a joint test, keeping the
joint type I error under control. Note that even if all the t-statistics are insignificant, it
is not excluded that the F-statistic is highly significant.
Durbin-Watson Statistic The Durbin-Watson (DW) statistic measures the serial cor-
relation (of order one) in the residuals. The statistic is computed as,
Pn
(rt − rt−1 )2
DW = t=2Pn 2 ,
t=1 ri
29
- 0 ≤ DW ≤ 4,
However, there are better tests for serial correlation in the error terms.
Residual plots help to check the model assumptions. The residuals ri or standardized
residuals ri /σ̂ are plotted versus their index (best for time series data) or versus Ŷi (best
for cross-sectional data). These plots should,
- if residuals have the tendency to remain “close” to each other, this might indicate
positive serial correlation. This is also the case when sequences of residuals with the
same sign are observed. Additionally, a correlogram of the residuals can be made.
- To check for normality of error terms, use QQ-plots. If the error terms deviate
strongly from normality and the sample size is not too large, then results are in
doubt.
30
500
1
residual
residual
0
0
-1
-1000
0 5 10 15 20 25 30 0 5 10 15 20 25 30
time time
8
2
6
1
residual
residual
4
0
2
-1
-2 0
-2
0 5 10 15 20 25 30 0 5 10 15 20 25 30
time time
0.4
residual
residual
50
0.0
0
-50
-0.4
0 5 10 15 20 25 30 0 5 10 15 20 25 30
time time
31
1.3.6 Using Dummy Variables
Example: Let Y be the income of a person. We want to relate it to its work experience X
(in years), sex (male/female) and educational level (primary/secondary/higher eduction).
We consider the model
Y = α + β1 X + β2 M + β3 E1 + β4 E2 + ²,
where
The expected income of a female with secondary school education and 10 years of work
experience is therefore 10β1 + β3 + α.
Exercise: During 7 years we measure every season the total number of cars sold in a certain
country (denoted by Yt , in units of 10000). Let Xt be the price of oil in real terms. We
want to regress Yt on Xt , a time trend and “seasonal dummies.” So the model is
1. Estimate and interpret the regression parameters. How does the interpretation
changes if X and Y are taken in logarithms?
32
3. Make the residual plot and comment.
Season Yt Xt
Spring 1990 16.30 39.04
Summer 1990 15.20 31.22
Autumn 1990 19.40 34.47
Winter 1990 15.60 31.44
Spring 1991 17.60 28.95
.. 15.98 29.59
.. 19.10 29.15
.. 16.53 22.99
.. 17.66 25.50
.. 16.17 28.71
.. 20.05 29.77
.. 16.12 25.86
.. 17.23 32.33
.. 16.41 27.88
.. 19.67 31.07
.. 15.35 27.80
.. 18.81 27.28
.. 18.64 28.32
.. 18.74 30.71
.. 15.24 27.28
.. 17.44 37.83
.. 17.30 30.74
.. 20.79 25.11
.. 16.03 28.52
.. 18.91 25.28
.. 16.52 30.01
.. 19.07 31.37
.. 12.83 37.65
33
1.3.7 Exercises
1. We have annual data for the UK economy, for the years 1953-1964, on the percentage
change in wages W and the percentage of the labor force unemployed U :
W 4.4 5.4 7.1 6.2 4.2 3.1 2.6 3.3 3.8 3.6 4.1 4.4
U 1.5 1.3 1.1 1.2 1.4 2.1 2.2 1.6 1.5 2.0 2.1 1.6
(a) Create a workfile and enter the data (use /quick/empty group (edit series)/).
(b) Make a scatterplot of W versus U . Does the relation between them looks
linear? What is their correlation?
(d) Have a look at the residual series. Do the assumptions H1-H5 seem to be
plausible? Make a QQ-plot of the residual series to check for normality.
(b) Estimate the regression equation log(Y ) = α + β log(X) + ². Predict the value
of Y for X = 200 and for X = 1000.
3. Ten experts make a prediction for the economic growth in the EU and ten other
experts in the US for next year:
34
EU US
2.1 2.6
2.5 2.4
2.3 3.2
1.4 0.8
1.5 1.3
1.5 2.1
2.4 1.6
2.7 3.2
2.8 3.1
1.1 1.4
(a) Test whether the predictions for US and Europe are on average the same.
(b) Test for normality of the error terms, given the small sample size.
(c) How does your answer change if the 10 experts making predictions are the
same?
35
1.4 Model Specification
Model specification refers to the following decisions: Which model to use? Which variables
to include? Which kind of disturbance terms? How to specify a model is a difficult task.
Diagnostic tests help us to check for the validity of the model specification. Also residual
plots can serve as diagnostics for the model assumptions. If the diagnostic tests reject the
validity of the model, then it is misspecified and another model needs to be proposed.
Demand for food in the USA, yearly data (1963-1992, file: “food.wmf”), with,
1. Make line graphs and some descriptive statistics of the series Q, X, P and G.
3. Interpret the signs and the magnitude of the estimated regression parameters.
Which variables are significant?
5. Make a graph of the actual and the fitted series log(Q). Make a residual plot (use
/View/Actual, fitted,residuals). Make a QQ-plot and a correlogram of the residuals.
Comment.
36
1.4.2 Coefficient Tests
The Wald-test is the most often used for testing for restrictions on the coefficients. If
k restrictions are tested for, then the Wald-test statistic follows asymptotically a chi-
squared distribution with k degrees of freedom. If we test for “H0 : g(parameters)=0”,
the Wald-test rejects the null hypothesis if “g(estimated parameters)” is too far from 0.
In case that all restrictions are linear the F -statistic can be used. Use the same example
as in the previous paragraph and answer the following questions.
3. Test
(a) H0 : β2 = β3 = 0,
(b) H0 : β1 + β2 + β3 = 0,
* Omitting important variables from the regression equation yields biased estimators.
If variables are important for economic reasons, they should remain in the model. Tests
for redundant/omitted variables are based on a comparison of the original model and the
model with/without the redundant/omitted variables. They compare
37
* the values of R2 for the 2 models (F-test),
Test the following questions on the example of the demand for food in the USA,
1. Use Eviews to test whether log(P ) is a redundant variable. Are you surprised by
this result? Delete log(P ) from the equation and estimate the model again using
/Estimate/.
Remark: If two (or more) explicative variables are highly correlated, then we have the
problem of multicollinearity. In this case, the estimates remain valid, but we have in-
creased variability. A simple solution to avoid multicollinearity is to drop one of the two
highly correlated variables. This is even mandatory in case of perfect correlation between
the two variables.
There exist many tests for misspecification. Most of them are based on the residuals.
* White heteroscedasticity test: with r denoting the residuals, we estimate the equa-
tion
r2 = c + γ1 X1 + γ2 X12 + γ3 X2 + γ4 X22 + . . .
38
* Breusch-Godfrey LM test for serial correlation: With r denoting the residuals, we
estimate the equation
with r(-1), ..., r(-k) lagged version of the residuals. Under the null hypothesis of no
serial correlation, none of these estimated gamma’s should be significantly different
from zero. The test uses the value of nR2 of the model, which should not be too big.
It needs to be compared with the critical value of a chi-square distribution with k
degrees of freedom.
The Least Squares estimator is still consistent (meaning that it converges to the true value
of the parameter if the sample size tends to infinity) under mild form of serial correlation
or heteroscedasticity. However, the theoretical expression of the covariance matrix of this
estimator is derived under the hypothesis H1 → H5 , and becomes invalid if H3 or H4 no
longer hold. In this case, one could use
Remarks:
* By specifying a dynamic model for the disturbance terms serial correlation can be
corrected for. In some cases adding lagged versions of the dependent (and indepen-
dent) variables can solve the problem of serial correlation.
Perform the following tasks with same example as in the previous paragraphs,
39
1. Apply the White heteroscedasticity test with and without cross-terms. What is the
difference between both tests? Do you need to reject the assumption of homoscedas-
ticity.
3. Look how the standard errors around your estimates change when you use the
White estimator for the covariance matrix of the estimator (Use the options when
estimating the model equation in Eviews). Same question for Newey/West.
40
2 Review: Multivariate Random Variables
∂F (u)
= f (u).
∂u1 . . . ∂up
41
* Cov(X) is symmetric and positive definite
Exercise:
2. Prove that trace(AB) = trace(BA) for any two matrices A and B for which the
matrix multiplications make sense.
In particular:
X ∼ Np (0, I) ⇔ fX (u1 , . . . , up ) = φ(u1 ) . . . φ(up ),
42
with
exp(−t2 /2)
φ(t) = √ .
2π
We have that
This implies that the distribution of Tn can be approximated by a Np (θ, V /n). The matrix
V is called the asymptotic covariance matrix of Tn .
where dg(θ)
dθ
∈ Rp×k is a matrix of partial derivatives of g. Therefore, g(Tn ) '
³ t dg(θ)
´
Nk g(θ), dg(θ)
dθ
V dθ
/n .
43
3 Non-linear Regression
3.1 Introduction
- y = endogenous/dependent/respons/... variable
- x = exogenous/explanatory/... variable
y ∈ R, x ∈ Rp
Example:
y=hourly wage
x=education
y = m(x) + ²
²⊥x
E[²|x] = 0
E[y|x] = m(x).
44
Note that E[y] = E[m(x)]. However, sometimes one can be interested in the whole
conditional distribution [y|x].
Most of the time, we will work with parametric models,
with f a specified function and θ an unknown parameter. The linear model, with f (x, θ) =
xt θ, is a special case. Here we can interpret the parameter,
δf (x, θ)
θ= = marginal effect,
δx
δlog ỹ
θ= = elasticity of ỹ w.r.t. x̃ if x = logx̃ and y = log ỹ.
δlogx̃
In a non-parametric setup, we only require that the function m is smooth, i.e. that the
function has derivatives of all orders.
45
Note that with
xt1 y
1
. ..
X = .. and Y = . ,
t
xn yn
and when supposing that rank(X)=p, we can rewrite
θ̂ = (X t X)−1 (X t Y ).
In the case of a non-linear regression model, for an initial value θ̂0 , we can write
δf (xi , θ)
f (xi , θ) ≈ f (xi , θ̂0 ) + (θ − θ̂0 )t |θ̂0 + . . . ,
δθ
and therefore,
n
X Xn
t −1
θ̂1 = ( x̃i x̃i ) ( x̃i ỹi ).
i=1 i=1
- How to choose the initial value θ̂0 ? Close to the value to be estimated θ̂?
- Is convergence guaranteed?
One can show, e.g. by using results from General Method of Moments Estimation, that
46
with
t
x˜
1 δf (xi , θ)
..
X̃ = . and x̃i = |θ0 ,
δθ
t
x˜n
and with θ0 denoted as the true parameter. In practice, use
n
δf (xi , θ) 2 1X
x̃i ≈ |θ̂ and σ̂ = (yi − f (xi , θ̂))2 .
δθ n i=1
Corollary: for the construction of a 95% confidence interval for θ0,i , we take
q
θ̂i ± 2 ∗ σ̂ (X̃ t X̃)−1
ii .
with the null hypothesis H0 : θ0,i = 0 and the alternative hypothesis HA : θ0,i 6= 0. Do
not reject H0 if
|θ̂ |
q i < 2.
σ̂ (X̃ t X̃)−1
ii
C = α + βY γ + ².
Using data from Table 1, the estimated coefficients which are stable after only six iter-
ations are given in Table 2 and the results for both linear en non-linear regression are
shown in Table 3. The marginal propensity to consume (MPC) is defined as
δC
= βγY γ−1 .
δY
The following null hypotheses are tested: firstly, take H0 : γ = 1, meaning that the
MPC is assumed to be constant and therefore refers to a linear model. We reject the null
hypothesis if
|γ̂ − 1|
> 1.96.
SE(γ̂)
47
Table 1: Aggregate Income, Y and Consumption,C
1950 1951 1952 1953 1954 1955 1956 1957 1958
Y 791.8 819.0 844.3 880.0 894.0 944.5 989.4 1012.1 1028.8
C 733.2 748.7 771.4 802.5 822.7 873.8 899.8 919.7 932.9
1959 1960 1961 1962 1963 1964 1965 1966 1967
Y 1067.2 1091.1 1123.2 1170.2 1207.3 1291.0 1365.7 1431.3 1493.2
C 979.4 1005.1 1025.2 1069.0 1108.4 1170.6 1236.4 1298.9 1337.7
1968 1969 1970 1971 1972 1973 1974 1975 1976
Y 1551.3 1599.8 1688.1 1728.4 1797.4 1916.3 1896.6 1931.7 2001.0
C 1405.9 1456.7 1492.0 1538.8 1621.9 1689.6 1674.0 1711.9 1803.9
1977 1978 1979 1980 1981 1982 1983 1984 1985
Y 2066.6 2167.4 2212.6 2214.3 2248.6 2261.5 2334.6 2468.4 2509.0
C 1883.8 1961.0 2004.4 2000.4 2024.2 2050.7 2145.9 2239.9 2312.6
α̂ β̂ γ̂
1 11.1458 0.898534 1.0
2 209.825 -0.237125 1.15139
3 187.786 0.246078 1.14613
4 187.710 0.245692 1.15699
5 187.915 0.245968 1.15641
6 187.899 0.246004 1.15640
48
Table 3: Outcomes for Linear and Non-Linear Model and Estimated Asymptotic Covari-
ance Matrix
Linear Non-Linear
Estimate Std. Error Estimate Std. Error
α 11.1458 9.64 187.899 38.946
β 0.898530 0.00586 0.246004 0.07947
γ 1.00000 - 1.15640 0.03927
e0 e 12.068 8420
σ 18.309 15.294
R2 0.99856 0.99899
α̂ β̂ γ̂
α̂ 1516.8 - -
β̂ -3.0533 0.006315 -
γ̂ 1.5045 -0.003120 0.001542
An asymptotic t-test, based on the standard normal distribution, was carried out,
|γ̂ − 1| 1.1564 − 1
z= = = 3.983.
SE(γ̂) 0.03927
This is larger than the critical value of 1.96 for a 5 percent significance level and thus
the linear model is rejected in favor of the nonlinear regression. Secondly, take H0 : γ =
1 and β = 1, to test whether the MPC is constant and equal to one. Since, under the
given null hypothesis,
µ ¶ µµ ¶ ¶
γ̂ 1 V ar(γ̂) Cov(γ̂, β̂)
≈ , ,
β̂ 1 Cov(γ̂, β̂) V ar(β̂)
| {z }
Σγβ
then
³ ´ γ̂ − 1
γ̂ − 1 β̂ − 1 Σ−1
γβ
∼ χ22 ,
β̂ − 1
and we reject the null hypothesis if the above quantity exceeds χ22,1−α . To test the
hypothesis that the MPC is one, we require a particular value of Y. Therefore, take
49
H0 : βγy0γ−1 = 1 for y0 = Y1985 = 2509. Using the Delta-method (for more information
on the Delta-method, see section 3.2.4), we obtain under H0 ,
−1
³ ³ ´ V ar(β) Cov(β, γ) δM P C/δβ ´
(β̂γ̂y0γ̂−1 − 1) ≈ N 0, δM P C/δβ δM P C/δγ
Cov(β, γ) V ar(γ) δM P C/δγ
³ ³ ´ γy0γ−1 ´
γ−1 γ−1 −1 .
≈ N 0, γy0 βy0 (1 + γln(y0 ) Σγβ
βy0γ−1 (1 + γln(y0 ))
| {z }
SE 2 (β̂γ̂y0γ̂−1 )
Following the method described above yields a standard error of 0.018423. We reject H0
if
|β̂γ̂y0γ̂−1 |
> 1.96,
SE
so for testing the hypothesis that in 1985 the MPC was equal to one, we compute the
following test-statistic,
|β̂γ̂y0γ̂−1 | 0.9676 − 1
z= = = 1.7587,
SE 0.018423
which is not larger than the critical value of 1.96 and therefore we can not reject the null
hypothesis that the MPC in 1985 was equal to one.
The Delta-method is used to derive an expression for the standard error of a a function
of an asymptotically normal estimator β̂. Assume that Cov(β̂) is known, and g is a real
valued function.
q
SE(g(β̂)) = V ar(g(β̂))
s
g(β)t
= V ar[g(β0 )) + |β (β̂ − β0 ) + . . .],
dβ 0
with β0 the true value of the parameter. The Taylor-expression was used in the equation
above and works well if β̂ is close to the true value β0 . Therefore, (β̂ − β0 ) is small, as
(β̂ − β0 )2 , (β − β0 )3 ,. . . , are an order of magnitude smaller. As a consequence,
50
q
SE(g(β̂)) = V ar(Aβ̂)
q
= Cov(Aβ̂) for a one-dimensional variable
q
= ACov(β̂)At
sµ ¶t
dg(β) dg(β)
≈ |β̂ Cov(β̂) |
dβ dβ β̂
v à !2
u
u dg( β̂)
SE(g(β̂)) = tV ar(β̂)
dβ
= SE(β̂)|g 0 (β̂)|
Take the data in the Eviews file “nlhome.wf1”. Let yt be the share of total US-steel
production produced by a certain type of technology, measured from 1970 up to 1997.
Consider the following model
α
yt = + ²t ,
1 + e−β−γt
where α, β and γ > 0 are the parameters to estimate. Here, t simply stands for the time.
The above model is an example of a diffusion model, and models how a certain new type
of technology is adapted by its users. This model is also used to model which percentage
of the population of potential costumers already purchased a new type of product, e.g. a
flat screen television.
1. The only explicative variable here is the time. The general formulation of the non-
linear regression model we saw in class was yt = f (xt , θ) + ²t , for t = 1, . . . , n. What
are xt , θ and f for this example?
2. Formulate explicitly, as a system of 3 equations, the first order conditions for the
LS estimator.
51
3. Plot the series. The value of α gives the diffusion level for t tending to infinity. It is
called the saturation level. Do you think that we are already close to the saturation
level?
Y=c(1)/(1+exp(-c(2)-c(3)*@trend))
6. Forecast the values of yt up to 2020. For this, you need to change the range of
the workfile (via ‘/Procs/Change workfile range’ in the workfile window) as well as
the sample range (via ‘/Sample’ in the workfile window) to 1970-2020. You can
make the forecast within the equation window via the Forecast menu by setting the
forecast range as 1970-2020. When are we getting close to the saturation level?
52
3.4 Homework: Some Theoretical Exercises
Define
Y = bt X + ²,
for 1 ≤ i ≤ n and with ²i the error terms. Suppose that β̂ = 1.5 and SE(β̂) = 0.75.
Suppose that γ̂ = 4 and SE(γ̂) = 0.75. Moreover, Cov(β̂, γ̂) = 0.2.
53
4 Non-parametric regression
In non-parametric regression or “scatterplot-smoothing” we try to draw a smooth line
through a scatterplot. Important here is that we will only take one explanatory variable
x ∈ IR. The approach works less well in higher dimensions.
4.1 Ingredients
Bandwidth
x0−h x0 x0+h
x
Fix x0 . Then,
54
α 1
β1 zi2
γ=
.. and zi =
..
. .
βp zip+1
This model can be considered as a linear regression model. Note that m(x0 ) = α. We
want to estimate α, which is the first component of γ̂ = α̂ = m̂(x0 ):
X X
γ̂ = ( zi zi0 )−1 ( zi yi )
i i
If we look at the graph, a polynomial fit might not be good, but if we look only at one
window, with bandwith h, it might be ok.
n
X
γ̂W LS = argmin (yi − zi0 γ)2 ∗ wi ,
γ
i=1
n
X
γ̂W LS = argmin (yi − zi0 γ)2 ∗ wi
γ
i=1
n
X √ √
= argmin ( wi yi − wi zi0 γ)2
γ
i=1
X X
=( z˜i z˜i 0 )−1 ( ẑi ŷi )
i i
X X
=( zi zi0 wi )−1 ( zi yi wi )
i i
= (Z 0 W Z)−1 (Z 0 W Y ),
55
with
w1 0 ... 0
0 w2 ... 0
W =
..
0 0 . 0
0 0 . . . wn
X
argmin [yi − α − β1 (x − x0 ) − . . . − βp (x − x0 )p ]2 ∗ wi ,
α,β1 ,...,βp i
with wi the weights, which become more important the closer we come to the center of the
window. The smaller the bandwidth h, the more local the fit, the larger the bandwidth
h, the more global the fit. The weights wi are defined as
µ ¶
xi − x0
wi = K
h
with K the Kernel function, which is defined over the interval [-1,1] with max in 0. For
xi = x0 −h, we have K((xi −x0 )/h) = K(−1), for xi = x0 , we have K((xi −x0 )/h) = K(0)
and for xi = x0 + h, we have K((xi − x0 )/h) = K(1). So for non-parametric regression
we need to define:
• x0 , depending on what you want to test, and the width of the window h
Compute over a a grid of x values the respective non-parametric estimates m̂(x). You
can take more grid points than observations.
Comments:
56
- Why do we only work with one explanatory variable? ...
- The choice of the bandwidth h, for p = 1 (as shown in the figures on the next page).
57
5 Econometrics of Panel Data
→ panel of size N × T .
58
• Yit is the income of family i during year t, for 1 ≤ i ≤ 1000,
and observed in years 2000, 2001, 2002, so T = 3.
Note that:
You find the data in the file ”penn.wmf”, already in Eviews format. We are in particular
59
1. Create a“pool” object in Eviews (‘/Object/New object’). Give it a name and define
the cross-section identifiers. These identifiers are those parts of the names of the
series identifying the cross-section.
2. Open the XR-variables as a group and make a plot of them. Compute them in
log-difference, using the PoolGenr menu of the pool object and “logdifXR?=
dlog(XR?)”. The “?” will be substituted by every cross-section identifier. Plot the
transformed variables.
3. Compute the medians of the variable I? for the different countries (use View/
descriptive statistics within the Pool object). Compute now the average value of I?
for every year.
4. Estimate the regression model for Brazil, using “/Quick/estimate equation’ and
specifying in Eviews the equation
dlog(rgdpl bra) c dlog(pop bra) i bra g bra dlog(xr bra) open bra
5. Now we want to pool the data of all countries, to increase the sample size. Use,
within the pooled object, ‘/Estimate’, and specify: dependent variable=
dlog(rgdpl?); common coefficients=c dlog(pop?) i? g? dlog(xr?) open?. This is a
pooled regression model.
6. Pooling the data ignores the fact that the data originate from different countries.
Dummy variables for the different countries need to be added. This can be done by
specifying the constant term as a “cross section specific coefficient.” We obtain a
fixed effect panel data model. Discuss the regression output.
7. The fixed effect panel data model assumes that the effect of openness is the same
for all countries. How could you relax this assumption?
8. Test whether all country effects are equal (to know how Eviews labels the coefficients,
use View/Representation), using a Wald test. The country effects are called the fixed
effects, and if they are significantly different, then there is unobserved heterogeneity.
60
5.3 The Generalized Least Squares estimator
Yi = Xi0β + ²i (i = 1, . . . , n)
with
For the standard model, the Ordinary Least Squares (OLS) esti-
mator is
61
What if the the errors are not homoscedastic and uncorrelated?
* Cross-sectional heteroscedasticity
* etc.
Yi = Xi0β + ²i (i = 1, . . . , n)
with
One can still use OLS (not even a bad idea), if one uses
62
The Generalized Least Squares (GLS) estimator will be
consistent and optimal and is given by
−1
Xn X n n X
X n
β̂GLS = wij XiXj
0
wij XiYj ,
i=1 j=1 i=1 j=1
wij = (Σ−1)ij .
riOLS = Yi − Xi0β̂OLS .
2. Use the above residuals to estimate the σij . [This will require
some additional assumptions on the structure of Σ.] Compute
then the GLS estimator with estimated weights wij .
63
Theoretical Example
Our sample of size n = 20 consists of two groups of equal size (e.g.
men and women). There is no correlation among the observations,
but we think that the variances of the error terms for men and
women might be of different size.
* σi2 = σii = σM
2
for i = 1, . . . , 10
* σij = 0 for i 6= j.
64
Computation of the (Feasible) GLS estimator:
2. Estimate
10 20
2 1 X OLS 2 2 1 X OLS 2
σ̂M = (r ) and σ̂F = (r ).
10 i=1 i 10 i=11 i
1 1
ŵi = 2 (i = 1, . . . , 10) and ŵ i = (i = 11, . . . , 20)
σ̂M σ̂F2
à n
!−1 Ã n
!
X X
⇒ β̂GLS = wiXiXi0 wiXiYi .
i=1 i=1
65
Application to panel data regression
Let ²it be the error term of a panel data regression model, with
1 ≤ i ≤ n, and 1 ≤ t ≤ T.
Three different specifications are common:
66
Example South American (continued)
67
5.4 The Fixed Effects Model
* The Xit are the observed part of the heterogeneity. The ²it
contain the remaining omitted variables.
68
Testing for unobserved heterogeneity: test for redundant fixed
effects
H0 : α1 = . . . = αN := α
15
Cross Section 1
Cross Section 2
10
Pooled Regression
y
Cross Section 3
0
1 2 3 4 5 6 7
x
69
5.4.1 Least Squares Dummy Variable (LSDV) Estimation
Yit = X 0 β + αi + ²it
it
Ȳ = X̄ 0 β + α + ²̄
i. i. i i.
Why?
and
µ1 + µ2 + . . . + µN = 0.
3. One can add time effects (or period effects) in the model:
71
4. If we think that the cross-sectional units are an i.i.d. sample
(typical for micro-applications), but serial correlation or period
heteroscedasticity is present (within each unit), then OLS can
be made more precise/efficient:
(a) V ar(²it ) = σt2 and all covariances between error terms are zero.
We have period heteroscedasticity. GLS can be applied (Period
weights):
(b) V ar(²it ) = σt2 , Cov(²it , ²is ) = σts , all other covariances zero. We
72
1. Have a look at the data in the Excel File. Write up the number of
observations, the number of variables, and the upper left cell of the
data matrix. Close the Excel file, create an unstructured Workfile and
read in the data (Proc/Import/Read Text Lotus Excel).
4. Use View/Graph to (i) Make a line plot of the time series for every
cross section (ii) Make boxplots of the distribution of investment over
the different cross sections and over time.
73
5.5 The Random Effects Model
Model
Yit = c + Xit0 β + ²it
²it = αi + vit.
• At the price of one extra parameter σα2 , the random effects model
allows for correlation within cross-section units:
For every i and t 6= s:
74
⇒ Within groups/cross sections correlation:
σα2
ρ = Corr(²it, ²is) = 2 .
σα + σv2
H0 : Corr(αi, Xit) = 0
75
Using fixed or random effects?
– etc.
76
Robust Standard Errors: For RE no weighted versions are avail-
able. Using robust standard errors (or coefficient covariance) might
be appropriate. This only affects the SE, not the estimators.
T << N .]
[robust to all forms of heteroscedasticity, but not robust for any type
77
Exercise Consider the Grunfeld data in “grundfeld2.wf1.” The
model was:
Iit = βi1Fit + βi2Kit + αi + ²it
78
5.6 Homework: The General Linear Model
We are interested in knowing whether persons having a loyalty card spend more in a
supermarket or not. Take the data in the Eviews file “glsex.wf1”. We have data for 2
supermarkets, and for each of them we have for 20 clients the amount spent (AMOUNT),
the size of the household the person belongs to (HHS), and a binary variable (CARD)
indicating whether the person owns a loyalty card or not. Denote yij the amount spent
by customer i in supermarket j, and xij the personal characteristics of the client (in this
case, only consisting of HHS). The proposed model is:
where αj , for j = 1, 2 are fixed effects for each supermarket. Our main interest is to know
whether δ is significant or not.
1. Estimate the parameters of the above model by OLS. Interpret briefly the param-
eters estimates. (Hint: since the β and δ parameters are supposed to be the same
for the 2 supermarkets, it will be necessary to pool the data. You will also need
to create yourself the appropriate dummy variables STORE1 and STORE2 to take
the fixed effects into account.)
2. We are afraid that there is groupwise heteroscedasticity in the error terms, i.e.
Var(²ij ) = σj2 for j = 1, 2.
(a) Estimate the variances σ12 and σ22 using the OLS-residuals.
(b) Estimate now the parameters by GLS. Write down an expression for Σ, the co-
variance matrix of the error terms, and show that GLS boils down to Weighted
Least Squares (WLS) here. Create the series of weights to be used, and carry
out the WLS estimation (in Eviews, take estimation method LS with option
Weighted LS).
3. We are also afraid that there might be interaction between the variable CARD and
the supermarket. In particular, the effect of the loyality card might differ among
79
different supermarkets. The model becomes now
(a) Estimate the above model by OLS. Do you think there might be interaction?
(Hint: creating the variables STORE1*CARD and STORE2*CARD might be
useful.)
4. Economists would say that there is a serious endogeneity problem here. There
probably exists a feed-back relation from AMOUNT to CARD. Could you explain
why this might be the case? Explain in words why it might indeed be that the error
terms are correlated with the variable CARD.
80
6 Binary Choice Models
• Exercise
! Problem: the left hand side of the above equations takes values
between 0 and 1, while the right hand side may take any value on
the real line.
Note that
0.6
y
Linear Fit
0.4
0.2
−0.2
0 1 2 3 4 5 6 7 8 9 10
x
82
6.3 Probit and Logit Regression
with
1
• F (u) = ⇒ Logit
1 + exp(−u)
• F (u) = Φ(u), standard normal cumulative distribution func-
tion ⇒ Probit
• etc.
data cloud
0.8
Probit Logit
0.6
y
0.4
Difference is small;
Probit function is steeper.
0.2
−0.2
0 1 2 3 4 5 6 7 8 9 10
x
83
Interpretation of parameters
dP (Y = 1|X)
= β1 f (β0 + β1X1 + . . . βpXp)
dX1
with f = F 0 always positive.
P (Y =\
1|X = xi) = F (β̂0 + β̂1xi1 + . . . β̂pxip).
Set ŷi = 1 if P (Y =\
1|X = xi) > 0.5 and zero otherwise.
84
(Other cut-off values than 0.5=50% are sometimes taken)
85
1
Pn
It is not difficult to check that p̂ = n i=1 yi , the fraction of
successes in the sample.
If the model is misspecified, then the MLE may loose the above
properties.
86
6.5 Estimating Binary Models in Eviews
Explanatory variables:
87
Both explanatory variables are highly significant. They have a
positive effect on the probability of deny, as expected. They are also
jointly highly significant (LR stat =152, P<0.001). The pseudo R-
squared is pretty low (R2 = 0.08).
88
Predictive accuracy (Expectation-prediction table):
89
6.6 Measures of Fit
90
In practice, we work with the P-value. For example, if k = 4 and
LR = 7.8
density of chi−squared distribution with 4 degrees of freedom
0.2
0.18
0.16
0.14
0.12
0.1
P−value=0.092
0.08
0.06
0.04
0.02
0
0 5 LR=7.8 10 15
n
1X
I(yi = ŷi)
n i=1
Its complement is an estimate of the error rate of the prediction
rule.
91
6.7 Other Limited Dependent Variable Models
92
6.8 Exercise
93
1. Why do we add TUCE to the regression model, if we are only inter-
ested in the effect of PSI?
(a) Estimate the probability that he will increase his grade if he fol-
lows the PSI-method.
(b) What is this probability to increase his grade if he will not follow
this PSI-method?
(d) Compute the log odds-ratio (for improving the grade or not) for
this student once for PSI=1 and once for PSI=0. Compute the
94
difference between these two log-odds ratios. Compare with the
regression coefficient of PSI.
5. The output shows the value “LR statistic”? How is this value com-
puted?
References:
- Stock, J.H., Watson, M.W., Introduction to Econometric, 2nd edition (2007) Pear-
son.
95
7 The Poisson Regression Model
The Poisson regression model aims at modeling a counting variable Y , counting the num-
ber of times that a certain event occurs during a given time period. We observe a sample
Y1 , . . . , Yn . Here, Yi can stand for the number of car accidents that person i has during
the last 5 years; the number of children of family i; the number of strikes in company i
over the last 3 years; the number of brevets deposed by firm i during the last year (as a
measure of innovation); .... The Poisson regression model wants to explain this counting
variable Yi using explicative variables xi , for 1 ≤ i ≤ n. This p-dimensional variable xi
contains characteristics for the ith observation.
The Poisson distribution is a discrete distribution. The shape of its distribution is shown
in Figure 1, for several values of λ. In Figure 1, the distribution is visualized by plotting
P (Y = k) versus k. For low values of λ, the distribution is highly skewed. For large
values of λ, the distribution of Y looks more normal. In the examples given above, Yi
counts rather rare event, so that the value of λ will be rather small. For example, we
have high probabilities of having no or one car accident, but the probabilities of having
several car accidents decay exponentially fast. The Poisson distribution is the most simple
distribution for modeling counting data, but it is not the only one.
Like in a linear regression model, we will model the conditional mean function using a
linear combination β t xi of the explicative variables:
96
Figure 1: The Poisson distribution for different values of λ
lambda=0.5 lambda=1
0.6
0.3
0.4
0.2
p
p
0.2
0.1
0.0
0.0
0 5 10 15 20 0 5 10 15 20
k k
0.08
p
p
0.10
0.04
0.05
0.0
0.0
0 5 10 15 20 0 5 10 15 20
k k
The use of the exponential function in (3) assures that the right hand side in the above
equation is always positive, as is the expected value of the counting variable Yi in the left
hand side of the above equation. The choice for this exponential “link” function is mainly
for reasons of simplicity. In principle, other “link” functions returning only positive values
could be used, but then we can’t refer to it as a Poisson regression model.
Moreover, to be able to use the Maximum Likelihood framework, we will specify a distri-
bution for Yi , given the explicative variables xi . We ask that every Yi , conditional on xi ,
follows a Poisson distribution with parameter λi . Equations (2) and (3) give
Aim is then to estimate β, the unknown parameter in the model. Note that estimation of
β induces an estimate of the whole conditional distribution of Yi given xi . This will allow
us to estimate quantities like P (Yi = 0|xi ), P (Yi > 5|xi ), .... So we will be able to answer
to questions like “What is the probability that somebody will have no single car accidents
during a 5 year period, given the persons characteristics xi ?”, “What is the probability
97
that a family, given its characteristics xi , has more than 5 children?”, ...
The marginal effect of the first explicative variable on the expected value of Yi , keeping
the other variables constant, is given by
∂E[Yi |xi ]
= β1 exp(β1 xi1 + β2 xi2 + β3 ).
∂xi1
We see that β1 has the same sign as this marginal effect, but the numerical value of the
effect depends on the value of xi . We could summarize the marginal effects by replacing
in the above equation xi1 an xi2 by average values of the explicative variables over the
whole sample. It is also possible to interpret β1 as a semi-elasticity:
∂ log E[Yi |xi ]
= β1 .
∂xi1
We observe data {(xi , yi )|1 ≤ i ≤ n}. The number yi is a realization of the random
variable Yi . The total log-likelihood is, using independency, given by
n
X
LogL(y1 , . . . , yn |β, x1 , . . . , xn ) = logP (Yi = yi |β, xi ),
i=1
98
The maximum likelihood (ML) estimator is then of course defined as
It is instructive to compute the first order condition that the ML-estimator needs to fulfill.
Derivation of (5) yields
n
X
(yi − ŷi )xi = 0,
i=1
t
with ŷi = exp(β̂M L xi ) the fitted value of yi . The predicted/fitted value has as usual been
taken as the estimated value of E[Yi |xi ]. This first order condition tells us that the vector
of residuals is orthogonal to the vectors of explicative variables.
The advantage of the Maximum Likelihood framework is that a formula for cov(β̂M L ) is
readily available: Ã n !−1
X
cov(β̂M L ) = xi xti ŷi
i=1
Also, hypothesis tests can now be carried out by Wald tests, Lagrange Multiplier tests or
Likelihood Ratio tests.
implying that the conditional mean function equals the conditional variance function.
This is very restrictive. If E[Yi |xi ] < Var[Yi |xi ], respectively E[Yi |xi ] > Var[Yi |xi ], we
speak of overdispersion and underdispersion respectively. The Poisson model does not
allow for over- or underdispersion. A richer model is obtained by using the negative
binomial distribution instead of the Poisson distribution. Instead of (4), we use
µ ¶yi µ ¶θ
Γ(θ + yi ) λi λi
P (Yi = yi |β, xi ) = 1− .
Γ(yi + 1)Γ(θ) λi + θ λi + θ
The symbol Γ refers to the Gamma function, which will be discussed in appendix of this
section. The negative binomial distribution above has conditional mean λi and conditional
99
variance λi (1 + η 2 λi ), with η 2 := 1/θ. Note that the parameter η 2 is not allowed to vary
over the observations. As before, the conditional mean function is modeled as
Using maximum likelihood, we can then estimate the regression parameter β and the extra
parameter η as well. The parameter η measures the degree of over- or under- dispersion.
The limit case η = 0 corresponds to the Poisson model.
100
7.5 Homework
We are interested in the number of accidents per service month for a sample of ships. The
data can be found in the file “ships.wmf”. The endogenous variable is called ACC. The
explicative variables are:
• CONSTRUCTION YEAR: the ships are constructed in one of four periods, leading
to the dummy variables T6064, T6569, T7074, and T7579.
• SERVICE: a measure for the amount of service that the ship has already carried
out.
Questions:
1. Make an histogram of the variable ACC. Comment on its form. Is this the histogram
for the conditional or unconditional distribution of ACC?
2. Estimate the Poisson regression model, including all explicative variables and a
constant term. (Use estimation method: COUNT- integer counting data.)
4. Perform a Wald test to test for the joint significance of the construction year dummy
variables.
7. What do we learn from the value of “Probability(LR stat)”? What is the corre-
sponding null hypothesis?
101
8. Estimate now a Negative Binomial Model. EViews reports the log(η 2 ) as the mixture
parameter in the estimation output. (a) Compare the estimates of β given by the
two models. (b) Compare the pseudo R2 values of the two models.
9. Estimate now the Poisson model with only a constant term, so without explicative
variables (empty model). Derive mathematically a formula for this estimate of the
constant term (in the empty model), using the first order condition of the ML-
estimator.
102
8 Solutions to Exercises
In this section the solutions to some exercises in these course notes and to the exercises
of the homework can be found.
Let X be the total amount of money (cash) that a family keeps at home. From a sample
of size n = 100 we obtain a sample mean of 800 Euro with a standard error 50 Euro. The
sample standard deviation is given by 500 Euro.
Solution
Using the central limit theorem, and given the relatively large number of observa-
tions, we can assume that the sample mean µ̂ is normally distributed. Hence, the
95%-CI for the sample mean is
Solution
We make the same assumption on the sample mean µ̂ as in (i). Hence, the 99%-CI
for the sample mean is
(iii) Construct an interval that will contain approximately 95% of the outcomes of X.
Solution
103
To construct this interval, we assume that the distribution of X is normal. Hence,
the 95%-CI for X, the total amount of (cash) money a family keeps at home, is
Since it is impossible to have a negative amount of cash money at home, we can cut
this interval at 0, and arrive at [0; 1780].
Solution
No. First of all, it is impossible to have less than 0 Euro in cash at home. Second,
the distribution is strongly skewed towards the left, with a long tail towards the
right, since most people keep only moderate amounts of money (< 800 Euro) at
home. On the other hand, there is a very small segment of the population which
keeps very large amounts of money at home, tens of thousands of Euro and more.
Important: Please keep in mind that the central limit theorem only makes a
statement about the distribution of a mean of a large number of random variables
(for example a sample mean), and does not say anything about the distribution of
the variable X itself.
(v) Would it be a good idea to construct my sample by interviewing 100 persons waiting
at the railway station? Do you have a better idea?
Solution
Restricting your sample to people waiting at the railway station isn’t a good idea,
because this introduces selection bias. Since not all types of people take the train
with the same frequency, you will not end up with a representative sample. For
example, around 17.00h, on a Friday, in the main station of Leuven, your sample will
consist mainly of students, and although students are a very interesting population
to study, they don’t represent the entire population well.
A better idea is, for example, to use stratified sampling: randomly picking out 20
towns/cities, and selecting 5 families from each town or city. Alternatively, you
could go into the national register, and select 100 households at random.
104
8.1.2 Exercise page 32
During 7 years we measure every season the total number of cars sold in a certain country
(in logarithms Yt ). Let Xt be the (log of the) price of oil in real terms. We want to regress
Yt on Xt + time trend + “seasonal dummies.” So the model is
Solution
Estimating the proposed model yields the following values for the estimates of the
parameters.
Here, the C corresponds to the constant or intercept term, and t is the trend variable
starting at 0, and which increases by 1 each season. S1 , S2 , and S3 correspond to the
effects of respectively summer, autumn, and winter, relative to spring, the reference
category, on the (log of the) total number of cars sold Yt .
Analysing the table, we find that all variables, except the trend variable, are highly
significant (p < 0.01%). We see that a higher oil price lowers the number of cars
sold during the quarter. More precisely, if the oil price increases by 1%, the total
number of cars sold will decrease by 0.171%, assuming all other variables remain
constant (ceteris paribus).
In the same spirit, the number of cars sold increases (non-significantly) by 1.588%
each season. Note that interpreting this variable has little merit, since it is not
significant, and in essence, a negligible effect.
105
Finally, relative to spring, more cars are sold in autumn, and less during the summer
and the winter. For a precise interpretation, you could be tempted to use the rule
that if x increases by 1, y increases on average and ceteris paribus by β · 100% in a
model of the form
log(Y ) = α + βX + ε.
Note that for small values of β, |β| < 0.3, the approximations eβ − 1 ≈ β (for β > 0)
and 1 − eβ ≈ β (for β < 0) still hold fairly well.
Using the rule above, we interpret the coefficients corresponding to the seasonal
dummies as: on average and ceteris paribus, the number of cars sold decreases in
summer with 1−e−1.358 = 75.28%, in winter with 1−e−2.728 = 93.47%, and increases
in autumn with e1.695 − 1 = 444.66%, with respect to spring.
Solution
The R2 -value of 0.8647 shows us that 86.47% of the total variance in the response
variable Yt is explained by the explanatory variables, which is usually a very high
percentage. As such, we can conclude that we have a model with good predictive
power.
The value of the Durbin-Watson statistic, 1.773, shows that there is a very slight
positive autocorrelation of lag one (or serial autocorrelation) between the residuals
of the estimated model. However, as this value is still close to 2, there is a reason-
able assumption that this serial autocorrelation is insignificant. A more conclusive
106
statistical test will have to be made to verify this however.
The F-statistic of 28.13 and it’s associated p-value (< 5·10−7 ) show that the variables
in the proposed model are jointly significant, i.e. that the null hypothesis
H0 : β = γ = δ1 = δ2 = δ3 = 0
can be strongly rejected. Note however that the F-statistic does not test for the
significance of the intercept α.
Solution
Theoretical Quantile-Quantile
2.0
3
1.5
2
1.0
Normal Quantile
1
0.5
0.0 0
-0.5 -1
-1.0
-2
-1.5
1990 1991 1992 1993 1994 1995 1996 -3
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
RESID RESID
There doesn’t seem to be much structure within the residuals and they look like
noise. Furthermore, the spread of the residuals does not appear to be changing,
although there aren’t many entries to base this observation on. We conclude that
there doesn’t seem to be much risk for autocorrelation and heteroscedasticity among
the residuals.
The right plot is a QQ plot, showing the residuals plotted against the quantiles of a
standard normal distribution, or vice versa. Looking at this plot we see that these
points lie approximately on a line, indicating that the residuals are approximately
normally distributed (a Jarque-Bera test could confirm this).
We are interested in knowing whether persons having a loyalty card spend more in a
supermarket or not. Take the data in the Eviews file “glsex.wf1”. We have data for 2
107
supermarkets, and for each of them we have for 20 clients the amount spent (AMOUNT),
the size of the household the person belongs to (HHS), and a binary variable (CARD)
indicating whether the person owns a loyalty card or not. Denote yij the amount spent
by customer i in supermarket j, and xij the personal characteristics of the client (in this
case, only consisting of HHS). The proposed model is:
where αj , for j = 1, 2 are fixed effects for each supermarket. Our main interest is to know
whether δ is significant or not.
1. Estimate the parameters of the above model by OLS. Interpret briefly the param-
eters estimates. (Hint: since the β and δ parameters are supposed to be the same
for the 2 supermarkets, it will be necessary to pool the data. You will also need
to create yourself the appropriate dummy variables STORE1 and STORE2 to take
the fixed effects into account.)
Solution
As can be seen, the only variable that has a truly significant effect on the amount
spent is the size of the customer’s household, where each additional member yields
an average increase of 31.01 currency units on the amount spent (given that all
other variables remain fixed). However, the possession of a loyalty card seems to
have an effect as well, which, although not significant, should not be dismissed out of
hand (p = 0.0624). It appears that customers with a loyalty card tend to spend, on
average, 9.43 currency units more than customers without a loyalty card (assuming
the other variables are the same).
108
2. We are afraid that there is groupwise heteroscedasticity in the error terms, i.e.
Var(²ij ) = σj2 for j = 1, 2.
(a) Estimate the variances σ12 and σ22 using the OLS-residuals.
(b) Estimate now the parameters by GLS. Write down an expression for Σ, the co-
variance matrix of the error terms, and show that GLS boils down to Weighted
Least Squares (WLS) here. Create the series of weights to be used, and carry
out the WLS estimation (in Eviews, take estimation method LS with option
Weighted LS).
Solution
σ̂12 = 93.052
σ̂22 = 327.002
(b) From these estimates of the groupwise residual variances, an expression for Σ̂
is found,
σ̂12 I20 0
Σ̂ =
0 σ̂22 I20
where In is the n × n identity matrix. From this, the GLS-estimator can be
computed.
θ̂GLS = (X t Σ̂−1 X)(X t Σ̂−1 Y )
where
HHS1,1 CARD1,1 STORE11,1 STORE21,1
.. .. .. ..
. . . .
HHS20,1 CARD20,1 STORE120,1 STORE220,1
X=
HHS1,2 CARD1,2 STORE11,2 STORE21,2
.. .. .. ..
. . . .
HHS20,2 CARD20,2 STORE120,2 STORE220,2
109
and Y = (y1,1 , . . . , y20,1 , y1,2 , . . . , y20,2 )t , θ = (β, δ, α1 , α2 )t . Working out the
expression for the estimator, it is found that, due to the form of Σ̂−1 ,
à 20 20
!−1
X 1 X 1
t t
= Xi,1,· X +
2 i,1,·
Xi,2,· Xi,2,·
i=1
σ̂1 i=1
σ̂22
à 20 20
!
X 1 X 1
t t
× Xi,1,· y +
2 i,1
Xi,2,· yi,2
i=1
σ̂1 i=1
σ̂22
à 20
!−1 Ã 20
!
XX XX
t t
= wi,j Xi,j,· Xi,j,· wi,j Xi,j,· yi,j
j=1,2 i=1 j=1,2 i=1
1 1
where wi,1 = σ̂12
and wi,2 = σ̂22
for i = 1, . . . , 20. As such, the GLS problem
has been reduced to a WLS problem. Using Eviews to estimate the parameters
yields the following:
The size of the household is still a very significant variable, but, as can be seen,
the possession of a loyalty card has also become a very significant influencing
110
factor on the amount of money spent by a customer, with a customer having
such a card spending, on average, 17.62 currency units more than a customer
without loyalty card (assuming the other variables are the same).
(c) OLS assigned the same weight to the observations of supermarket 1 as to the
observations of supermarket 2, whereas the observations of supermarket 2 are
less useful, due to the higher variance, than those of supermarket 1. This
caused the effect of owning a loyalty card to be partially masked. Hence, WLS
is more efficient.
3. We are also afraid that there might be interaction between the variable CARD and
the supermarket. In particular, the effect of the loyality card might differ among
different supermarkets. The model becomes now
(a) Estimate the above model by OLS. Do you think there might be interaction?
(Hint: creating the variables STORE1*CARD and STORE2*CARD might be
useful.)
Solution
111
The household size is still a highly significant variable, and owning a loyalty
card still has a significant effect on the amount purchased, but only for super-
market 1! For supermarket 2, there is only a non-significant, negative effect.
From this, the conclusion can be drawn that there is an interaction between
the supermarket and owning a loyalty card for the amount of money expended.
(b) Testing whether the effect is significant or not amounts to performing a Wald
test for:
H0 : δ1 = δ2
4. Economists would say that there is a serious endogeneity problem here. There
probably exists a feed-back relation from AMOUNT to CARD. Could you explain
why this might be the case? Explain in words why it might indeed be that the error
terms are correlated with the variable CARD.
Solution
Most likely, the loyalty card will mostly be promoted to customers who spend much
but don’t have a card yet. Since for those big spenders, the benefits of the card are
most clear, they will be more inclined to get one for themselves.
Vice versa, for people already owning a card, the benefits of it are higher if they
spend more, so they might be inclined to naturally spend more than a person without
a card, leading to a correlation between owning a loyalty card (CARD) and the error
term in the model.
Remark: because the number of observations were the same for both groups, this
problem could also have been solved like a panel data problem (for panel data
models, see chapter 5). However, this is NOT true in general, as different groups
will contain a different number of observations.
112
8.3 Homework Non-Linear Regression, page 51
1. The only explicative variable here is the time. The general formulation of the non-
linear regression model we saw in class was yt = f (xt , θ) + ²t , for t = 1, . . . , n. What
are xt , θ and f for this example?
Solution
xt = t
θ = (α, β, γ)t
α
f (xt , θ) =
1 + e−β−γt
2. Formulate explicitly, as a system of 3 equations, the first order conditions for the
LS estimator.
Solution
In general, the first order condition for the model yt = f (xt , θ) + ²t is the following
n ¯
X d ¯
2¯
(yt − f (xt , θ)) ¯ =0
t=1
dθ θ=θ̂
or equivalently ¯
Xn
d ¯
(yt − f (xt , θ)) f (xt , θ)¯¯ =0
t=1
dθ θ=θ̂
113
For the example under consideration, this yield the following system of equations
P ¯
n ∂ ¯
t=1 (yt − f (xt , θ)) ∂α f (xt , θ) θ=θ̂ = 0
P ¯
n ∂ ¯
t=1 (yt − f (xt , θ)) ∂β f (xt , θ)¯ =0
¯θ=θ̂
P
n (y − f (x , θ)) ∂ f (x , θ)¯¯
t=1 t t ∂γ t =0
θ=θ̂
Pn ³ ´¯
∂ α ¯
(y
t=1 t − f (x t , θ)) ∂α 1+e−β−γt ¯
=0
P ³ ´¯θ=θ̂
n ∂ α ¯
t=1 (yt − f (xt , θ)) ∂β 1+e−β−γt ¯ =0
P ³ ´¯ θ=θ̂
nt=1 (yt − f (xt , θ)) ∂γ ∂ α ¯
1+e−β−γt ¯
=0
θ=θ̂
P
n
(yt − f (xt , θ)) 1+e−1β̂−γ̂t = 0
Pt=1
n αe−β̂−γ̂t
t=1 (yt − f (xt , θ)) (1+e−β̂−γ̂t )2 = 0
Pn (y − f (x , θ)) αte−β̂−γ̂t = 0
t=1 t t (1+e−β̂−γ̂t )2
Note that it is not necessary to make a first order Taylor approximation in order to
write down the first order conditions.
3. Plot the series. The value of α gives the diffusion level for t tending to infinity. It is
called the saturation level. Do you think that we are already close to the saturation
level?
Solution
A visual inspection of the plot does not give an indication that the steel production
by that type of technology is close to the saturation level.
Solution
Performing the model estimation yields the following output (in LATEX typeset form):
Dependent Variable: Y
Method: Least Squares
Sample: 1970 — 1997
Included observations: 28
114
Figure 2: The series yt w.r.t. the year
115
of iterations.
Solution
Changing the convergence criterion to 10−10 , and resetting the starting values for the
parameter estimates back to zero, 10 iterations are needed to achieve convergence,
which is not surprising because of the nature of the algorithm used (Gauss-Newton,
quadratic convergence, i.e. the number of correct digits doubles in each iteration
step when the algorithm is “close enough” to the true solution).
6. Forecast the values of yt up to 2020. For this, you need to change the range of the
workfile (via ‘/Procs/Change workfile range’ in the workfile window) as well as the
sample range (via ‘/Procs/Sample’ in the workfile window) to 1970-2020. You can
make the forecast within the equation window via the Forecast menu by setting the
forecast range as 1970-2020. When are we getting close to the saturation level?
Solution
As expected, the saturation level for the specific method of producing steel is almost
reached in the year 2020, and one can say that it is getting “close” to the saturation
level as soon as the estimated value of α, α̂ = 0.462303, lies within the confidence
bounds of the forecast, which happens somewhere around the year 2000.
116
Figure 3: A forecast of the series yt in the range 1970-2020, with confidence bounds in
dashed lines
point is given by tI = −β/γ (the root of the second derivative of the diffusion
curve). What is your estimate of tI ? What do we learn from this? Test whether
the hypothesis that the point of inflection occurred in 1990 holds.
Solution
−0.794178
t̂I = − = 6.797
0.116835
and thus, the year of inflection is 1977, due to 1970 being the reference year (t = 0).
This means that, roughly, after the year 1977, there is a decrease in the number of
US-steel producers that switch to the new technology.
To check the hypothesis that the point of inflection occurred in 1990, a Wald test
needs to be performed with the following null-hypothesis:
β
H0 : − = 1990 − 1970 = 20
γ
117
Performing the Wald test yields the following output
F-statistic 75.02388 Probability 0.000000
Chi-square 75.02388 Probability 0.000000
from which the conclusion can be drawn that the year of inflection is very signifi-
cantly different from 1990.
Define
Y = bt X + ²,
1
Var(Y ) = Var(bt X + ε) = bt Cov(X)b + Var(ε) = bt Σb + 1
118
(c) Compute the correlation between Y and ².
Solution
1
Cov(Y, ε) = Cov(bt X + ε, ε) = Cov(ε, ε) = Var(ε) = 1
Cov(Y, ε) 1 1
ρY,ε = p =√ =
Var(Y )Var(ε) 4 2
for 1 ≤ i ≤ n and with ²i the error terms. Suppose that β̂ = 1.5 and SE(β̂) = 0.75.
Suppose that γ̂ = 4 and SE(γ̂) = 0.75. Moreover, Cov(β̂, γ̂) = 0.2.
Solution
p = 0.0455
meaning that the significance is not really strong, i.e. it is a borderline case.
Solution
1
holds because X and ε are independent.
119
The hypothesis H0 : β = 5 must be tested. The 95% confidence interval is the
same as in (a), and it is found that 5 6∈ [0.0375; 2.9625]. Moreover, computing
the p-value, it is found that
p < 10−5
(c) Using the Delta-method (for more information see section 3.2.4), compute
SE(β̂ 2 ).
Solution
First, define
g : IR −→ IR : θ = β 7−→ β 2
dg(β) dβ 2
= = 2β
dβ dβ
√ d
n(β̂ 2 − β 2 ) −→ N (0, 4β̂ 2 V )
or
SE(β̂ 2 ) = 2.25
Solution
120
Testing the hypothesis H0 : β = γ is equivalent to testing H0 : β − γ = 0. A
side computation reveals that
Since it is larger than |1.96|, we can reject the null hypotheses with 95% signif-
icance, and thus, β̂ and γ̂ are significantly different. The p-value (p = 0.0166)
shows that the significance isn’t very strong.
121
and, filling in the values,
³ ´ (0.75)2 0.2 4
Var(β̂γ̂) = 4 1.5 = 12.67
2
0.2 (0.75) 1.5
or
√
SE(β̂γ̂) = 12.67 = 3.56
log(y) = α + βx + γz + . . .
From this, it is easily seen that, if x increases with 1 unit, log(y) is expected
to increase with β units, or analytically
d log(y)
=β
dx
A quick calculation then reveals that, if dx = 1
dy
d log(y) = ≈β
y
This leads to
³ y ´
dy ≈ βy = β × 100
100
or
dy = (β × 100)%
We are interested in the number of accidents per service month for a sample of ships. The
data can be found in the file “ships.wmf”. The endogenous variable is called ACC. The
explicative variables are:
122
• TYPE: there are 5 types of ships, labeled as A-B-C-D-E or 1-2-3-4-5. TYPE is a
categorical variable, so 5 dummy variables can be created: TA, TB, TC, TD, TE.
• CONSTRUCTION YEAR: the ships are constructed in one of four periods, leading
to the dummy variables T6064, T6569, T7074, and T7579.
• SERVICE: a measure for the amount of service that the ship has already carried
out.
Questions:
1. Make an histogram of the variable ACC. Comment on its form. Is this the histogram
for the conditional or unconditional distribution of ACC?
Solution
On first sight, the shape of the histogram is more or less that of a histogram of a
Poisson distributed variable, with the parameter λ small (around 2 or 3). This is
the histogram of the unconditional distribution of ACC.
123
2. Estimate the Poisson regression model, including all explicative variables and a
constant term. (Use estimation method: COUNT- integer counting data.)
Solution
Estimation of the full model, according to a Poisson Model, yields the following
values for the coefficients (TA was chosen to be the reference category for TYPE,
and T6064 was chosen to be the reference category for CONSTRUCTION YEAR):
Note that if TA and/or T6064 were included in the model as well, an error message
would be generated, and no estimation would be performed.
Solution
The estimated value of the coefficient for the variable SERVICE is very small, as is its
standard error. This is most likely due to the variable having large variance. Calling
up the descriptive statistics for the variable supports this (σ̂ = 9644.166). However,
despite the small absolute value of the coefficient, it is still highly significant (p ≤
5 · 10−5 ).
124
4. Perform a Wald test to test for the joint significance of the construction year dummy
variables.
Solution
yields a value of 40.56575 for the χ2 -statistic, and hence, a p-value smaller than
5 · 10−7 , indicating that the construction year dummy variables are jointly strongly
significant.
Solution
Denote x = (0, 0, 0, 0, 1, 0, 0, 1000)t the encoding of the ship (the first four numbers
are the dummy’s for TYPE, where type A is the reference category as before, the
next three are for CONSTRUCTION YEAR, with 60-64 the reference category, and
the last entry is the value for SERVICE), and let λ̂(x) be the predicted number of
accidents per service month. Then
t
λ̂(x) = E[ACC | x] = ex β̂+β̂0
where β̂0 is the estimate of the intercept. Filling in the estimated values for β̂, it
is found that λ̂(x) = 6.444693. From this, the following two probabilities can be
estimated:
(a)
P̂(ACC = 0 | x) = e−λ̂(x) = 0.001588932
125
(b)
= 0.01182911
Solution
This indicates that the 9th step of the iterative algorithm used to compute the ML
estimators returned a solution which was closer to the previous solution than the
convergence criterion.
7. What do we learn from the value of “Probability(LR stat)”? What is the corre-
sponding null hypothesis?
Solution
The value “probability (LR)” is here very low (smaller than 5 · 10−7 ) meaning that
here, the variables used to construct the model are jointly extremely significant.
Writing this as a null hypothesis yields:
H0 : β = 0
with β the vector of parameters (excluding the intercept), and as explained before,
this hypothesis can be strongly rejected.
8. Estimate now a Negative Binomial Model. EViews reports the log(η 2 ) as the mixture
parameter in the estimation output. (a) Compare the estimates of β given by the
two models. (b) Compare the pseudo R2 values of the two models.
126
Solution
Estimation of the full model, according to a Negative Binomial Model, yields the
following values for the coefficients (TA was chosen to be the reference category for
TYPE, and T6064 was chosen to be the reference category for CONSTRUCTION
YEAR):
Coefficient Std. Error t-Statistic Prob.
C 0.380217 0.469879 0.809181 0.4184
TB 0.997461 0.539695 1.848192 0.0646
TC -1.112612 0.477577 -2.329703 0.0198
TD -0.882003 0.444638 -1.983644 0.0473
TE -0.147957 0.418851 -0.353246 0.7239
T6569 0.982162 0.419928 2.338883 0.0193
T7074 1.857623 0.443221 4.191192 0.0000
T7579 1.097159 0.498048 2.202918 0.0276
SERVICE 6.41E-05 2.42E-05 2.645928 0.0081
log(η 2 ) -1.104076 0.454214 -2.430740 0.0151
Once again, TA was chosen as reference category for TYPE and T6064 was chosen
as reference category for CONSTRUCTION YEAR.
(a) As can be seen, some of the estimated coefficients have changed only slightly
between models, while others (the intercept, β̂TE , β̂T7074 , β̂T7579 , β̂SERVICE ) have
changed (much) more.
(b) For the two estimated models, Poisson Model (PM) and Negative Binomial
Model (NBM), the following values for pseudo R2 have been found:
Model Pseudo R2
PM 0.713334
NBM 0.757713
This indicates that the Negative Binomial Model is slightly preferable, in terms
of explicative power, than the Poisson Model.
127
9. Estimate now the Poisson model with only a constant term, so without explicative
variables (empty model). Derive mathematically a formula for this estimate of the
constant term (in the empty model), using the first order condition of the ML-
estimator.
Solution
Estimation of the estimation model, according to a Poisson Model, yields the fol-
lowing value for the estimated intercept:
Coefficient Std. Error t-Statistic Prob.
C 2.348570 0.053000 44.31276 0.0000
Analytically, this coefficient can be found as follows. Given the data {acci } (since
no variables will be included), the total log-likelihood can be written as (using
independency of the observations)
n
X
log L(acc1 , . . . , accn | β0 ) = log P(ACC = acci | β0 )
i=1
e−λ λacci
P(ACC = acci | β0 ) =
(acci )!
β
e−e 0 eβ0 acci
=
(acci )!
and hence
n
X
log L(β0 ) = {−eβ0 + β0 acci − log(acci )!}
i=1
To get an analytic expression for this estimator, write down the first-order condition
128
of the estimator:
¯ n
d ¯ X
log L(β0 )¯¯ = {−eβ̂0 + acci }
dβ0 β0 =β̂0 i=1
n
X
β̂0
= −ne + acci }
i=1
= n(acc − eβ̂0 )
= 0
or
eβ̂0 = acc ⇔ β̂0 = log acc
129