Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

Type author name/s here

Dougherty

Introduction to Econometrics,
5th edition
Chapter heading
Chapter 4: Nonlinear Models and
Transformations of Variables

© Christopher Dougherty, 2016. All rights reserved.


QUADRATIC EXPLANATORY VARIABLES

Y   1   2 X 2   3 X 22  u

We will now consider models with quadratic explanatory variables of the type shown. Such
a model can be fitted using OLS with no modification.

1
QUADRATIC EXPLANATORY VARIABLES

Y   1   2 X 2   3 X 22  u

However, the usual interpretation of a parameter, that it represents the effect of a unit
change in its associated variable, holding all other variables constant, cannot be applied. It
is not possible for X2 to change without X22 also changing.
2
QUADRATIC EXPLANATORY VARIABLES

Y   1   2 X 2   3 X 22  u

dY
  2  2 3 X 2
dX 2

Differentiating the equation with respect to X2, one obtains the change in Y per unit change
in X2. Thus, the impact of a unit change in X2 on Y, (b2 + 2b3X2), is a function of X2.

3
QUADRATIC EXPLANATORY VARIABLES

Y   1   2 X 2   3 X 22  u

dY
  2  2 3 X 2
dX 2

This means that b2 has an interpretation that is different from that in the ordinary linear
model where it is the unqualified effect of a unit change in X2 on Y.

4
QUADRATIC EXPLANATORY VARIABLES

Y   1   2 X 2   3 X 22  u

dY
  2  2 3 X 2
dX 2

In this model, b2 should be interpreted as the effect of a unit change in X2 on Y for the
special case where X2 = 0. For nonzero values of X2, the marginal effect will be different.

5
QUADRATIC EXPLANATORY VARIABLES

Y   1   2 X 2   3 X 22  u

dY
  2  2 3 X 2
dX 2

Y  1    2   3 X 2  X 2  u

b3 also has a special interpretation. If we rewrite the model as shown, b3 can be interpreted
as the rate of change of the coefficient of X2, per unit change in X2.

6
QUADRATIC EXPLANATORY VARIABLES

Y   1   2 X 2   3 X 22  u

dY
  2  2 3 X 2
dX 2

Y  1    2   3 X 2  X 2  u

Only b1 has a conventional interpretation. As usual, it is the value of Y (apart from the
random component) when X2 = 0.

7
QUADRATIC EXPLANATORY VARIABLES

Y   1   2 X 2   3 X 22  u

dY
  2  2 3 X 2
dX 2

Y  1    2   3 X 2  X 2  u

There is a further problem. We know that the estimate of the intercept may have no
sensible meaning if X2 = 0 is outside the data range. If X2 = 0 lies outside the data range, the
same type of distortion can happen with the estimate of b2.
8
QUADRATIC EXPLANATORY VARIABLES

. gen SSQ = S*S


. reg EARNINGS S SSQ
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 2, 497) = 23.44
Model | 6061.38243 2 3030.69122 Prob > F = 0.0000
Residual | 64267.5838 497 129.311034 R-squared = 0.0862
-----------+------------------------------ Adj R-squared = 0.0825
Total | 70328.9662 499 140.939812 Root MSE = 11.372
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .1910651 1.785822 0.11 0.915 -3.317626 3.699757
SSQ | .0366817 .0606266 0.61 0.545 -.0824344 .1557978
_cons | 8.358401 12.86047 0.65 0.516 -16.90919 33.62599
----------------------------------------------------------------------------

We will illustrate this with the earnings function. The table gives the output of a quadratic
regression of earnings on schooling (SSQ is defined as the square of schooling).

9
QUADRATIC EXPLANATORY VARIABLES

. gen SSQ = S*S


. reg EARNINGS S SSQ
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 2, 497) = 23.44
Model | 6061.38243 2 3030.69122 Prob > F = 0.0000
Residual | 64267.5838 497 129.311034 R-squared = 0.0862
-----------+------------------------------ Adj R-squared = 0.0825
Total | 70328.9662 499 140.939812 Root MSE = 11.372
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .1910651 1.785822 0.11 0.915 -3.317626 3.699757
SSQ | .0366817 .0606266 0.61 0.545 -.0824344 .1557978
_cons | 8.358401 12.86047 0.65 0.516 -16.90919 33.62599
----------------------------------------------------------------------------

The coefficient of S implies that, for an individual with no schooling, the impact of a year of
schooling is to increase hourly earnings by $0.19.

10
QUADRATIC EXPLANATORY VARIABLES

. gen SSQ = S*S


. reg EARNINGS S SSQ
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 2, 497) = 23.44
Model | 6061.38243 2 3030.69122 Prob > F = 0.0000
Residual | 64267.5838 497 129.311034 R-squared = 0.0862
-----------+------------------------------ Adj R-squared = 0.0825
Total | 70328.9662 499 140.939812 Root MSE = 11.372
----------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .1910651 1.785822 0.11 0.915 -3.317626 3.699757
SSQ | .0366817 .0606266 0.61 0.545 -.0824344 .1557978
_cons | 8.358401 12.86047 0.65 0.516 -16.90919 33.62599
----------------------------------------------------------------------------

It is also doubtful whether the intercept has any sensible interpretation. Literally, it implies
that an individual with no schooling would have hourly earnings of $8.36, which seems
implausibly high.
11
QUADRATIC EXPLANATORY VARIABLES
------------------------
120 EARNINGS | Coef.
-----------+------------
S | .1910651
100 SSQ | .0366817
_cons | 8.358401
------------------------
Hourly earnings ($)

80

60

40

20 quadratic

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling (highest grade completed)

The quadratic relationship is illustrated in the figure. Over the range of the actual data, it
fits the observations tolerably well. The fit is not dramatically different from those of the
linear and semilogarithmic specifications.
12
QUADRATIC EXPLANATORY VARIABLES
------------------------
120 EARNINGS | Coef.
-----------+------------
S | .1910651
100 SSQ | .0366817
_cons | 8.358401
------------------------
Hourly earnings ($)

80

60

40

20 quadratic

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of schooling (highest grade completed)

Most wage equation studies prefer the semilogarithmic specification. The slope coefficient
has a simple interpretation and the specification does not give rise to nonsensical
predictions outside the data range.
13
QUADRATIC EXPLANATORY VARIABLES

Average annual percentage growth rates


Employment GDP Employment GDP

Australia 2.57 3.52 Korea 1.11 4.48


Austria 1.64 2.66 Luxembourg 1.34 4.55
Belgium 1.06 2.27 Mexico 1.88 3.36
Canada 1.90 2.57 Netherlands 0.51 2.37
Czech Republic 0.79 5.62 New Zealand 2.67 3.41
Denmark 0.58 2.02 Norway 1.36 2.49
Estonia 2.28 8.10 Poland 2.05 5.16
Finland 0.98 3.75 Portugal 0.13 1.04
France 0.69 2.00 Slovak Republic 2.08 7.04
Germany 0.84 1.67 Slovenia 1.60 4.82
Greece 1.55 4.32 Sweden 0.83 3.47
Hungary 0.28 3.31 Switzerland 0.90 2.54
Iceland 2.49 5.62 Turkey 1.30 6.90
Israel 3.29 4.79 United Kingdom 0.92 3.31
Italy 0.89 1.29 United States 1.36 2.88
Japan 0.31 1.85

The data on employment growth rate, e, and GDP growth rate, g, for 25 OECD countries in
Exercise 1.5 provide another example where one might consider the use of a quadratic
function.
14
QUADRATIC EXPLANATORY VARIABLES

. gen gsq = g*g


. reg e g gsq
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 31
-----------+------------------------------ F( 2, 28) = 7.03
Model | 6.05131556 2 3.02565778 Prob > F = 0.0034
Residual | 12.0579495 28 .430641052 R-squared = 0.3342
-----------+------------------------------ Adj R-squared = 0.2866
Total | 18.109265 30 .603642167 Root MSE = .65623
----------------------------------------------------------------------------
e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
g | .6616232 .2988805 2.21 0.035 .0493942 1.273852
gsq | -.0490589 .0336736 -1.46 0.156 -.1180362 .0199185
_cons | -.2576489 .5845635 -0.44 0.663 -1.455073 .939775
----------------------------------------------------------------------------

The output from a quadratic regression is shown. gsq has been defined as the square of g.

15
QUADRATIC EXPLANATORY VARIABLES

quadratic
Employment growth rate

hyperbolic
1

0 ------------------------
0 1 2 3 4 5 6 7 e |8 9
Coef.
-----------+------------
g | .6616232
-1
gsq | -.0490589
_cons | -.2576489
------------------------
-2

GDP growth rate

The quadratic specification appears to be an improvement on the hyperbolic function fitted


in a previous slideshow. It is more satisfactory than the latter for low values of g, in that it
does not yield implausibly large negative predicted values of e.
16
QUADRATIC EXPLANATORY VARIABLES

quadratic
Employment growth rate

hyperbolic
1

0 ------------------------
0 1 2 3 4 5 6 7 e |8 9
Coef.
-----------+------------
g | .6616232
-1
gsq | -.0490589
_cons | -.2576489
------------------------
-2

GDP growth rate

The only defect is that it predicts that the fitted value of e starts to fall when g exceeds 7.

17
QUADRATIC EXPLANATORY VARIABLES

3
quartic
Employment growth rate

quadratic

0
0 1 2 3 4 5 6 7 8 9

cubic

-1
GDP growth rate

Why stop at a quadratic? Why not consider a cubic, or quartic, or a polynomial of even
higher order? There are usually several good reasons for not doing so.

18
QUADRATIC EXPLANATORY VARIABLES

3
quartic
Employment growth rate

quadratic

0
0 1 2 3 4 5 6 7 8 9

cubic

-1
GDP growth rate

Diminishing marginal effects are standard in economic theory, justifying quadratic


specifications, at least as an approximation, but economic theory seldom suggests that a
relationship might sensibly be represented by a cubic or higher-order polynomial.
19
QUADRATIC EXPLANATORY VARIABLES

3
quartic
Employment growth rate

quadratic

0
0 1 2 3 4 5 6 7 8 9

cubic

-1
GDP growth rate

The second reason follows from the first. There will be an improvement in fit as higher-
order terms are added, but because these terms are not theoretically justified, the
improvement will be sample-specific.
20
QUADRATIC EXPLANATORY VARIABLES

3
quartic
Employment growth rate

quadratic

0
0 1 2 3 4 5 6 7 8 9

cubic

-1
GDP growth rate

Third, unless the sample is very small, the fits of higher-order polynomials are unlikely to
be very different from those of a quadratic over the main part of the data range.

21
QUADRATIC EXPLANATORY VARIABLES

3
quartic
Employment growth rate

quadratic

0
0 1 2 3 4 5 6 7 8 9

cubic

-1
GDP growth rate

These points are illustrated by the figure, which shows cubic and quartic regressions with
the quadratic regression. Over the main data range, from g = 1.5 to g = 5, the fits of the
cubic and quartic are very similar to that of the quadratic.
22
QUADRATIC EXPLANATORY VARIABLES

3
quartic
Employment growth rate

quadratic

0
0 1 2 3 4 5 6 7 8 9

cubic

-1
GDP growth rate

R2 for the quadratic specification is 0.334. For the cubic and quartic it is 0.345 and 0.355,
relatively small improvements.

23
QUADRATIC EXPLANATORY VARIABLES

3
quartic
Employment growth rate

quadratic

0
0 1 2 3 4 5 6 7 8 9

cubic

-1
GDP growth rate

Further, the cubic and quartic curves both exhibit implausible characteristics.

24
QUADRATIC EXPLANATORY VARIABLES

3
quartic
Employment growth rate

quadratic

0
0 1 2 3 4 5 6 7 8 9

cubic

-1
GDP growth rate

As g increases, the slope of the cubic first diminishes and then increases. There is no
reasonable explanation. The quartic curve actually declines for values of g from 5 to 7, and
then exhibits a strange upward twist at its end.
25
Copyright Christopher Dougherty 2016.

These slideshows may be downloaded by anyone, anywhere for personal use.


Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.

The content of this slideshow comes from Section 4.3 of C. Dougherty,


Introduction to Econometrics, fifth edition 2016, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
www.oxfordtextbooks.co.uk/orc/dougherty5e/.

Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.

2016.05.02

You might also like