Professional Documents
Culture Documents
Microeconometrics
Microeconometrics
Pavel Čížek
(P.Cizek@uvt.nl)
Fall 2016
Econometrics Slide 1
Introduction
Introduction
⊲ Introduction
Martin Salm (Room K 642, Email: M.Salm@uvt.nl)
Course structure Pavel Čížek (Room K 641, Email: P.Cizek@uvt.nl)
Outline
Reduced models
Structural models
• MicroEconometrics
Estimation
◦ linear and nonlinear regression models
Binary choice
Density estimation
◦ estimation techniques for single-equation models
Regression est.
Semiparametrics
• Main book:
Discrete choice
◦ A. C. Cameron and P. K. Trivedi (2005) Microeconometrics:
Censored data
Methods and Applications, Cambridge University Press.
Final thoughts
Econometrics Slide 2
Course structure
Introduction
Econometric models
Introduction
⊲ Course structure
Outline • linear models and causality
Reduced models
Structural models • duration models
Estimation • nonlinear models for discrete or limited responses
Binary choice
Semiparametrics
• linear models and their estimation
Discrete choice • maximum likelihood and generalized method of moments
Censored data • non- and semiparametric estimation
Final thoughts
Econometrics Slide 3
Outline
Introduction
Introduction
Topics in the second half of the course
Course structure
⊲ Outline • parameter estimation of reduced-form models
Reduced models
Structural models
◦ including binary-choice models
Estimation
Binary choice
• nonparametric and semiparametric estimation
Density estimation
Econometrics Slide 4
Reduced-form models
Introduction
Introduction
Estimation methodology designed for the reduced form models with
Course structure given statistical properties; for example,
Outline
⊲ Reduced models
Structural models yi = x⊤
i β + εi
Estimation
Binary choice
Econometrics Slide 5
Structural models
Introduction
Models can have not only statistical, but also economic structure
Introduction
Course structure describing how economic behavior, institutions, and laws affect
Outline
Reduced models relationship between variables yi and xi ; for example,
⊲ Structural models
Estimation
• the ith firm’s production yi , labor input li , and capital ki can be
Binary choice related by the (deterministic) Cobb-Douglas production function:
Density estimation
Discrete choice
◦ α and β interpreted as elements of the production function
Censored data ◦ the economic validity can be studied: are firms operating
Final thoughts efficiently under state ownership or under regulators
Introduction
Introduction
Structural models
Course structure
Outline • relate to economic theory
Reduced models
⊲ Structural models • facilitate interpretation
Estimation
Reduced-form models
Binary choice
Discrete choice
Identification
Censored data • For a given structural model, is there only one reduced-form
Final thoughts
model?
• Is a given reduced-form model corresponding to multiple
structural models?
• Are all considered structural models rendering the same values
of (some) parameters in the reduced-form model?
(e.g., consider variation in Ai [in]dependent of i, li , or ki )
Econometrics Slide 7
Introduction
⊲ Estimation
Method of moments
GMM
Maximum likelihood
General MLE
Comparison
Quasi-MLE
Quantile regression
Asymptotics
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Econometrics Slide 8
Method of moments - linear regression
Introduction
Random sample (x1 , y1 ), . . . , (xn , yn ) following model
Estimation
⊲ Method of moments
GMM yi = x⊤
i β + εi
Maximum likelihood
General MLE
Comparison
Quasi-MLE • first conditional moment of εi : E(εi |xi ) = 0
Quantile regression
Asymptotics • unconditional moment equation:
Binary choice
xi E(εi |xi ) = 0 ⇒ E{E(xi εi |xi )} = E(xi εi ) = 0
Density estimation
• population equation: E(xi εi ) = E{xi (yi − x⊤
i β)} = 0
Regression est.
• sample analog equation (to be solved):
Semiparametrics
Pn
Discrete choice n−1 Pn
i=1 x i (y i − x ⊤ β) =
i P
−1 −1 n ⊤β = 0
Censored data n i=1 x i yi − n i=1 x i x i
Final thoughts
• solution
" n
#−1 " n
#
X X
β̂n = n−1 xi x⊤
i n−1 xi yi
i=1 i=1
Econometrics Slide 9
Generalized method of moments
Introduction
Generalized method of moments (GMM): define
Estimation
Method of moments • data wi (e.g., wi = (yi , xi )⊤ = (yi , ki , li )⊤ ) of sample size n
⊲ GMM
Maximum likelihood • parameters of interest θ ∈ Θ ⊆ Rp ; its true value θ0 solves
General MLE
Comparison • moment conditions g(wi , θ) : Rp → Rk such that
Quasi-MLE
Quantile regression
Asymptotics
E{g(wi , θ0 )} = 0
Binary choice
Density estimation
(e.g., g(wi , θ) = xi (yi − x⊤
i θ) for E[x i (yi − x ⊤ θ)] = 0)
i
Regression est.
Semiparametrics
The GMM estimator minimizes wrt. θ
Discrete choice " n
#⊤ " n
#
Censored data 1 X 1 X
Qn (θ) = g(wi , θ) Wn g(wi , θ)
Final thoughts
n n
i=1 i=1
Econometrics Slide 10
Maximum likelihood estimation
Introduction
Linear regression model: yi = x⊤ 2
Estimation
i β0 + εi , where εi ∼ N (0, σ ) iid
Method of moments
GMM
• distribution of yi is known conditionally on xi :
⊲ Maximum likelihood
2
General MLE
Comparison yi = x⊤
i β 0 + ε i ⇒ yi |x i ∼ N (x ⊤
i β 0 , σ )
Quasi-MLE
Quantile regression
Asymptotics • the likelihood contribution is the value of the density φ of yi |xi
Binary choice
• the likelihood is the conditional density of {yi }n n
i=1 given {xi }i=1
Density estimation
Regression est.
Ln (β, σ 2 ) = f (y1 , . . . , yn |x1 , . . . , xn ; β, σ 2 )
Semiparametrics Yn
Discrete choice = φ(yi |xi ; β, σ 2 )
Censored data
Xi=1n
2
ln Ln (β, σ ) = ln φ(yi |xi ; β, σ 2 )
Final thoughts i=1
Econometrics Slide 11
Maximum likelihood estimation
Introduction
• the normal density function of yi |xi ∼ N (x⊤ 2
Estimation
i β, σ )
Method of moments 2
GMM
2 1 (yi − x⊤
i β)
⊲ Maximum likelihood φ(yi |xi ; β, σ ) = √ exp −
General MLE
2πσ 2 2σ 2
Comparison
Quasi-MLE
Quantile regression
• the log-likelihood function for yi |xi ∼ N (x⊤ 2
Asymptotics i β, σ )
Binary choice n
X
Density estimation
ln Ln (β, σ 2 ) = l(yi |xi ; β, σ 2 )
Regression est.
i=1
Semiparametrics
Xn
Discrete choice
= ln φ(yi |xi ; β, σ 2 )
Censored data
i=1
Final thoughts n
X 2
1 (yi − x⊤ β)
= − ln(2π) + ln(σ 2 ) + i
2 σ2
i=1
Introduction
Maximum likelihood estimation (MLE): define
Estimation
Method of moments • data wi = (yi , xi )⊤ (e.g., (yi , xi )⊤ = (yi , ki , li )⊤ ) of size n
GMM
Maximum likelihood • assume the true conditional density fy (yi |xi ; θ0 ) is known up to
⊲ General MLE
Comparison some parameters of interest θ ∈ Θ ⊆ Rp
(e.g., θ = (β, σ 2 )⊤ based on the model parameters β and
Quasi-MLE
Quantile regression
Asymptotics
the parameters of the density of errors εi )
Binary choice
Density estimation • identification: true parameter θ0 maximizes E[ln f (yi |xi ; θ)]
Regression est.
Semiparametrics
• log-likelihood for observations i: l(wi , θ) = ln f (yi |xi ; θ)
Discrete choice Pn
• log-likelihood function: ln Ln (θ) = n−1 i=1 l(wi , θ)
Censored data
Final thoughts
• maximum likelihood estimate
n
X
1
θ̂n = arg maxθ ln Ln (θ) = arg maxθ l(wi , θ)
n
i=1
Econometrics Slide 13
Comparison
Introduction
Assuming the correct moment equations, GMM is
Estimation
Method of moments • consistent and asymptotically normal
GMM
Maximum likelihood • linear regression: E(εi |xi ) = 0
General MLE
⊲ Comparison
Quasi-MLE
Assuming the correct parametric distribution, MLE is
Quantile regression
Asymptotics • consistent, asymptotically normal, and
Binary choice can be asymptotically efficient with as. variance I −1 (θ), where
Density estimation I(θ) = E[−∂ ln f (yi |xi ; θ)/∂θ∂θ⊤ ]
• linear regression: εi ∼ N (0, σ 2 )
Regression est.
Semiparametrics
Econometrics Slide 14
Quasi-maximum likelihood estimation
Introduction
Linear regression model: yi = x⊤ 2
Estimation
i β0 + εi , where εi ∼ N (0, σ ) iid
Method of moments
GMM
• distribution of yi is known conditionally on xi :
Maximum likelihood
2
General MLE
Comparison yi = x⊤
i β 0 + ε i ⇒ yi |x i ∼ N (x ⊤
i β 0 , σ )
⊲ Quasi-MLE
Quantile regression
Asymptotics • the likelihood is the conditional density of {yi }n n
i=1 given {xi }i=1
Binary choice
Xn
2
Density estimation
ln Ln (β, σ ) = ln φ(yi |xi ; β, σ 2 )
Regression est.
i=1
Semiparametrics
• quasi-MLE estimator solves the first-order conditions
Discrete choice
Censored data
∂ ln Ln (β, σ 2 ) Xn ∂ ln φ(yi |xi ; β, σ 2 )
Final thoughts = =0
∂(β ⊤ , σ 2 )⊤ i=1 ∂(β ⊤ , σ 2 )⊤
Introduction
• the normal density function of yi |xi ∼ N (x⊤ 2
Estimation
i β, σ )
Method of moments 2
GMM
2 1 (yi − x⊤
i β)
Maximum likelihood φ(yi |xi ; β, σ ) = √ exp −
General MLE
2πσ 2 2σ 2
Comparison
⊲ Quasi-MLE
Quantile regression
• the log-likelihood function for yi |xi ∼ N (x⊤ 2
Asymptotics i β, σ )
Binary choice n
X 2
1 (yi − x⊤
i β)
Density estimation
ln Ln (β, σ 2 ) = − ln(2π) + ln(σ 2 ) +
Regression est. 2 σ2
i=1
Semiparametrics
Discrete choice
• the first-order conditions
Censored data
n
Final thoughts ∂ ln Ln (β, σ 2 ) 1 X
= 2 (yi − x⊤
i β)xi = 0
∂β σ
i=1
Econometrics Slide 16
Quasi-maximum likelihood estimation
Estimation
i β, 1)
Method of moments
GMM 1 n o
2 ⊤
Maximum likelihood φ(yi |xi ; β, σ ) = exp −|yi − xi β|
General MLE 2
Comparison
⊲ Quasi-MLE
Quantile regression • the log-likelihood function for yi |xi ∼ DExp(x⊤
i β, 1)
Asymptotics
n
X
Binary choice
2 1
Density estimation ln Ln (β, σ ) = ln − yi − x⊤ β
i
2
Regression est. i=1
Semiparametrics
∂ ln Ln (β, σ 2 ) X 1
− I(yi − x⊤
Final thoughts
=2 i β ≥ 0) xi = 0
∂β 2
i=1
Econometrics Slide 17
Quantile regression
Regression est.
and is consistent if med(εi |xi ) = 0
Semiparametrics ⇒ identifies conditional median med(yi |xi ) = x⊤
i β
Discrete choice • quantile regression estimator minimizes
Censored data
n
X
⊤
Final thoughts |τ − I(yi − x⊤
i β ≤ 0)| yi − x i β
i=1
Econometrics Slide 18
Quantile regression
Introduction
Quantile regression (QR) in linear regression model
Estimation
Method of moments • QR is based on assumption Qτ (εi |xi ) = 0
GMM
Maximum likelihood • QR identifies conditional quantile Qτ (yi |xi ) = x⊤
i β
General MLE
Comparison • QR estimator minimizes
Quasi-MLE n
X
⊲ Quantile regression
Asymptotics ρτ (yi − x⊤
i β)
Binary choice i=1
Density estimation
Regression est.
where the check function ρτ (z) = [τ − I(z < 0)] · z
Semiparametrics • case of τ = 1/2: median regression or least absolute deviation
Discrete choice estimation as ρτ (z) = |z|/2
Censored data
Final thoughts
Econometrics Slide 19
Quantile regression – simulated examples
Introduction
Estimation
Method of moments
Normal data Log-normal data
20
4
GMM
Maximum likelihood
15
2
General MLE
10
0
Comparison Y
Y
Quasi-MLE
5
-2
⊲ Quantile regression
0
Asymptotics
-4
Binary choice -3 -2 -1 0 1 2 -3 -2 -1 0 1 2
X X
Density estimation
Normal data with quantiles Log-normal data with quantiles
Regression est.
20
4
15
Semiparametrics
2
10
Discrete choice
0
Y
Y
Censored data
5
-2
Final thoughts
0
-4
-3 -2 -1 0 1 2 -3 -2 -1 0 1 2
X X
Econometrics Slide 20
Quantile regression – simulated examples
Introduction
Estimation
Method of moments
Normal data Heteroscedastic data
GMM
2
Maximum likelihood
5
General MLE
Comparison
0
Quasi-MLE
Y
Y
⊲ Quantile regression
0
-2
Asymptotics
-4
Binary choice
-5
Density estimation
-6
-2 0 2 4 -2 0 2 4
Regression est.
X X
Discrete choice
5
Censored data
Final thoughts
0
Y
Y
0
-5
-5
-2 0 2 4 -2 0 2 4
X X
Econometrics Slide 21
Quantile regression: Engel curve
Introduction
Density estimation
Regression est.
Semiparametrics
Discrete choice
Coefficients:
Censored data
Econometrics Slide 22
Quantile regression: Engel curve
Introduction
Estimation
Method of moments
GMM
2000
Maximum likelihood
General MLE
Comparison
Quasi-MLE
⊲ Quantile regression 1500
Asymptotics
Binary choice
Food Expenditure
Density estimation
Regression est.
1000
Semiparametrics
Discrete choice
Censored data
500
Final thoughts
mean (LSE) fit
median (LAE) fit
Household Income
Econometrics Slide 23
Quantile regression
Estimation
i β0 (τ ) + εi with
Method of moments Qτ (εi |xi ) = 0
GMM
Maximum likelihood • data form random sample (yi , xi )n
i=1
General MLE
Comparison • conditional distribution functions Fi (yi |xi ) are absolutely
Quasi-MLE
Quantile regression continuous with continuous densities fi (yi |xi ) uniformly
⊲ Asymptotics
bounded away from 0 and ∞ at Qτ (yi |xi ), i = 1, . . . , n
Binary choice
Density estimation
• matrices D0 = E(xi x⊤
i ) and
Regression est. D1 (τ ) = E{f (Qτ (yi |xi ))xi x⊤
i } are positive definite
Semiparametrics QR
then the quantile regression estimator β̂n (τ ) is consistent and
Discrete choice
Censored data √ QR
d
Final thoughts
n β̂n (τ ) − β0 (τ ) → N (0, τ (1 − τ )D1−1 D0 D1−1 )
Econometrics Slide 24
Quantile regression
Introduction
Buchinsky (1998) Recent Advances in Quantile Regression Models:
Estimation
Method of moments
A Practical Guideline for Empirical Research. The Journal of Human
GMM Resources 33(1), 88–126.
Maximum likelihood
General MLE
Comparison • properties of quantile regression estimator
Quasi-MLE
Quantile regression • computation of quantile regression estimator
⊲ Asymptotics
• inference and tests based on quantile regression
Binary choice
(test of homoscedasticity, symmetry,
Density estimation
Regression est.
• application to Current Population Survey data (1973–1993)
Semiparametrics • censored quantile regression (discussed later)
Discrete choice
Censored data
Final thoughts
Econometrics Slide 25
Introduction
Estimation
⊲ Binary choice
Binary choice
Probit and logit
MLE
Marginal effects
Measures of fit
Application
Heteroscedasticity
Simulation Binary choice models
Semiparametrics
MSC
Single index
Semiparametric LS
Klein and Spady
Implementation
Average derivative
Outlook
Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Econometrics Slide 26
Introduction to binary choice models
Introduction
Binary choice = binary response: a single discrete decision that can
Estimation
be characterized by values 0 and 1
Binary choice
⊲ Binary choice • traditionally, y = 1 = “yes, success” and y = 0 = “no, failure”
Probit and logit
MLE
• example: labor force participation, university education, foreign
Marginal effects
Measures of fit direct investment, public vs. private transport...
Application
Heteroscedasticity
Simulation Typically derived from a structural model for the latent variable y ∗
Semiparametrics
MSC
Single index
• y ∗ represents monetary utility, profit, ...
Semiparametric LS
Klein and Spady
• example: seller = price - purchasing value of an object,
Implementation
buyer = (monetary) utility from the object - price
Average derivative
Outlook (nontrivial in the cases of education, job, ...)
Density estimation
Censored data
Final thoughts
Econometrics Slide 27
Introduction to binary choice models
Introduction
Typically derived from a structural model for the latent variable y ∗
Estimation
Density estimation
• regression model characterizes expectation (for individual i)
Regression est.
Semiparametrics
E(yi |xi ) = P (yi = 1|xi ) · 1 + P (yi = 0|xi ) · 0 = P (yi = 1|xi )
Discrete choice = P (yi∗ = Uia − Uib ≥ 0|xi )
= P (x⊤ ⊤ ⊤
Censored data
i β + ε i ≥ 0|x i ) = P (x i β + ε i ≥ 0|x i β)
Final thoughts
Econometrics Slide 28
Introduction to binary choice models
Introduction
What can be identified in
Estimation
Binary choice
• choice a: utility Ua = w ⊤ δa + za⊤ γa + ǫa ?
⊲ Binary choice
Probit and logit
• choice b: utility Ub = w ⊤ δb + zb⊤ γb + ǫb ?
MLE
Marginal effects The reduced-form model E(yi |xi ) = P (x⊤ i β + ε i ≥ 0|x ⊤ β)
i
Measures of fit
Application corresponds to the difference in utilities Ua − Ub :
Heteroscedasticity
Simulation
Semiparametrics y ∗ = w⊤ (δa − δb ) + za⊤ γa − zb⊤ γb + ǫa − ǫb = x⊤ β + ε
MSC
Single index
Semiparametric LS
Klein and Spady • δa and δb cannot be identified separately,
Implementation
Average derivative only their difference δa − δb can be identified
Outlook
• what about identification of γa and γb (often assumed γa = γb )
Density estimation
Regression est.
◦ if za and zb contain different variables/quantities
Semiparametrics
◦ if za and zb contain common variables/quantities
Discrete choice
Censored data
◦ do we have a choice?
Final thoughts
Econometrics Slide 29
Probit and logit
Introduction
Suppose that the latent utility yi∗ follows the linear model
Estimation
Binary choice
Binary choice
yi∗ = x⊤
i β0 + εi , εi ∼ F
⊲ Probit and logit
MLE
Marginal effects
Measures of fit
• observed response is binary (decision, choice, success, ...):
Application
Heteroscedasticity
yi = I(yi∗ > 0) = I(x⊤ i β + εi > 0)
Simulation
Semiparametrics
• εi is symmetrically distributed and has zero mean: Eεi = 0
MSC
Single index
• identification by normalization: σ 2 = varεi = 1, for example
(yi = I(x⊤ ⊤ σ
i β + εi > 0) = I(xi β/σ + εi /σ > 0) = yi )
Semiparametric LS
Klein and Spady
Implementation • regression function if εi ∼ F :
Average derivative
Outlook
Estimation
i β)
Binary choice • F is completely specified (does not depend on parameters)
Binary choice
⊲ Probit and logit • probit = F is the standard normal distribution
MLE
Marginal effects ˆ t
Measures of fit
Application F (t) ≡ Φ(t) = φ(s)ds
Heteroscedasticity −∞
Simulation
Semiparametrics
MSC (σ normalized to 1)
Single index
Semiparametric LS • logit = F is the (standard) logistic distribution with
Klein and Spady
Implementation
the location parameter 0 and scale parameter 1
Average derivative
Outlook exp(t) 1
Density estimation
F (t) ≡ Λ(t) = =
1 + exp(t) 1 + exp(−t)
Regression est.
√
Semiparametrics
(σ normalized to π/ 3 ≈ 1.814)
Discrete choice
Censored data
Final thoughts
Econometrics Slide 31
Maximum likelihood estimation
Estimation
i β), F known
Binary choice • identification requires also E(xi x⊤
i ) to be non-singular:
P (x⊤ ⊤ β ) > 0 implies P [F (x⊤ β) 6= F (x⊤ β )] > 0
Binary choice
Probit and logit i β =
6 x i 0 i i 0
⊲ MLE
Marginal effects
if F is strictly monotonic and completely specified
Measures of fit
Application
• likelihood contribution:
Heteroscedasticity
Simulation
Semiparametrics
L(β|yi , xi ) = P (yi = 1|xi )yi P (yi = 0|xi )1−yi
yi 1−yi
= F (x⊤ ⊤
MSC
Single index i β) {1 − F (xi β)}
Semiparametric LS
Klein and Spady
Implementation • log-likelihood contribution:
Average derivative
Outlook l(yi , xi ; β) = yi ln F (x⊤
i β) + (1 − yi ) ln{1 − F (x ⊤ β)}
i
Density estimation • log-likelihood function:
Regression est.
n
X
Semiparametrics
Discrete choice
ln Ln (β) = [yi ln F (x⊤ ⊤
i β) + (1 − yi ) ln{1 − F (xi β)}]
Censored data i=1
Final thoughts
Econometrics Slide 32
Probit Φ(x⊤ ⊤
i β̂n ) and logit Λ(xi β̂n ): coronary heart disease
Introduction
probit chd age; predict probit, p
Estimation
--------------------------------------------------
Binary choice
Binary choice z <- glm(chd ~ age,family=binomial(link="probit"))
Probit and logit
⊲ MLE
z$fitted.values
Marginal effects
Measures of fit
1
Application
Heteroscedasticity
Simulation
.8
Semiparametrics
MSC
.6
Single index
Semiparametric LS
Klein and Spady
.4
Implementation
Average derivative
Outlook
.2
Density estimation
0
Regression est.
10 20 30 40 50 60 70
Semiparametrics Age (in years)
Final thoughts
Econometrics Slide 33
Probit and logit: coronary heart disease data
Introduction
Introduction
∂F (x⊤ β)
Estimation
∂P (yi = 1|xi = x)
Binary choice pj (x) = = = f (x⊤ β)βj
Binary choice ∂xij ∂xij
Probit and logit
MLE
⊲ Marginal effects (pj (x) is auxiliary/temporary notation)
Measures of fit
Application
Heteroscedasticity
• pj (x) depends on f = F ′ , but ratios pj (x)/pk (x) do not
Simulation
Semiparametrics
• probit: pj (x) = φ(x⊤ β)βj (= 0.399βj at x⊤ β = 0)
MSC
Single index
• logit: pj (x) = λ(x⊤ β)βj (= 0.25βj at x⊤ β = 0)
Semiparametric LS
Klein and Spady
Implementation
Marginal effects
Average derivative Pn
Outlook • reported at average i=1 xi /n or
Pn ⊤
average marginal effects i=1 f (xi β̂n )β̂nj /n
Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Econometrics Slide 35
Interpretation – marginal effects: coronary heart disease
Introduction
Discrete choice
Censored data
Final thoughts
Econometrics Slide 36
Measures of fit
Introduction
How to measure fit in the binary-choice models?
Estimation Pn
Binary choice
• percentage correctly predicted (PCP) = i=1 I[yi = ŷi ]/n,
Binary choice
where ỹi = I{F (x⊤
i bn ) > 0.5}
Probit and logit
MLE
Marginal effects ◦ misleading if one response rarely observed
⊲ Measures of fit
Application ◦ threshold 0.5 not suitable if P (yi = 1|xi ) is always low/high
Heteroscedasticity
Simulation
Semiparametrics • pseudo-R2 = 1 − ln Ln (β̂n )/ ln Ln ((1, 0, . . . , 0)⊤ )
MSC
Single index • other measures exist, but interpretation is more important:
Semiparametric LS
Klein and Spady
Implementation ◦ marginal effects at average pj (x̄)
Average derivative Pn
Outlook ◦ average marginal effects i=1 pj (xi )/n
Density estimation ◦ correct predictions per categories (yi = 1 and yi = 0)
Regression est.
◦ cross-tabulation of yi versus ŷi
Semiparametrics
Discrete choice
Censored data
Final thoughts
Econometrics Slide 37
Example: coronary heart disease data
Introduction
Introduction
A. van Soest (1995) Structural models of family labor supply, Journal
Estimation
of Human Resources 30(1), 63–88.
Binary choice
Binary choice
Probit and logit
• model of labor supply of couples forming households
MLE
Marginal effects
• labor supply of man and woman discretized (25–36 choices)
Measures of fit
⊲ Application
• imperfectly predictable wages and hours restrictions
Heteroscedasticity implemented
Simulation
Semiparametrics • estimation via simulated maximum likelihood
MSC
Single index
Semiparametric LS
Klein and Spady
Implementation
Average derivative
Outlook
Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Econometrics Slide 39
Distributional assumptions and heteroscedasticity
Introduction
Latent linear model with heteroscedasticity:
Estimation
Binary choice
Binary choice yi∗ = x⊤
i β + εi
Probit and logit
MLE
Marginal effects • conditional mean E(εi |xi ) = 0
Measures of fit
Application • conditional variance var(εi |xi ) = var(εi |xi0 ) 6= const. σ 2
⊲ Heteroscedasticity
Simulation
Semiparametrics ◦ generally unknown function of xi0 = xi without intercept
MSC
Single index ◦ parametric form can be assumed to facilitate estimation, e.g.,
Semiparametric LS
Klein and Spady
Implementation var(εi |xi ) = exp(α + x⊤
i0 γ)
Average derivative
Outlook
Semiparametrics
◦ linear-regression model: ordinary LS consistent
Discrete choice ◦ binary-choice model (εi ∼ N (0, exp(x⊤
i0 γ)) with α = 0):
Censored data “homoscedastic” maximum likelihood inconsistent if γ 6= 0
Final thoughts
Econometrics Slide 40
Simulated linear regression
Introduction
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Econometrics Slide 41
Simulated linear regression
Introduction
Estimation
4
Binary choice
Binary choice
Probit and logit
MLE
Marginal effects 2
Measures of fit
Application
Heteroscedasticity
0
⊲ Simulation
Semiparametrics
MSC
−2
Single index
Semiparametric LS
Klein and Spady
Implementation
−4
Average derivative
Outlook
0 1 2 3 4 5
Density estimation x
Discrete choice
Censored data
Final thoughts
Econometrics Slide 42
Simulated probit regression
Introduction
Introduction
1
Estimation
Binary choice
Binary choice
.8
Probit and logit
MLE
Marginal effects .6
Measures of fit
Application
Heteroscedasticity
⊲ Simulation
.4
Semiparametrics
MSC
Single index
.2
Semiparametric LS
Klein and Spady
Implementation
0
Average derivative
Outlook
0 1 2 3 4 5
Density estimation x
Discrete choice
Censored data
Final thoughts
Econometrics Slide 44
Heteroscedasticity – estimation
Estimation
i0 γ))
⊤ ⊤γ
Binary choice
P (yi = 1|xi ) = Φ{xi β/ exp(xi0 )}
Binary choice
Probit and logit
2
MLE
Marginal effects • xi0 does not contain intercept
Measures of fit
Application • x⊤
i β and x ⊤ γ could contain different variables
i0
Heteroscedasticity
⊲ Simulation • proof as for the standard probit:
Semiparametrics
MSC
P (yi = 1|xi ) = P (x⊤ ⊤
i β0 + εi > 0|xi ) = P (xi β0 > −εi |xi )
Single index
Semiparametric LS
= P (x⊤ ⊤ ⊤
i β0 / exp(xi0 γ/2) > −εi / exp(xi0 γ/2)|xi )
Klein and Spady = Φ{x⊤ i β 0 / exp(x ⊤ γ/2)}
i0
Implementation
Average derivative • more flexible than the standard probit (recall how to estimate!)
Outlook
Density estimation
• complicated marginal effects (interpretation!)
Regression est. ∂P (yi = 1|xi = x)
pj (x) =
Semiparametrics
∂xij
Discrete choice
⊤ ⊤γ ⊤γ γj ⊤
Censored data = φ{xi β exp(−xi0 )} · exp(−xi0 )[βj − (xi β)]
2 2 2
Final thoughts
Econometrics Slide 45
Probit and heteroscedasticity: coronary heart disease data
Introduction
Introduction
Maximum likelihood estimator
Estimation
Binary choice
• can be asymptotically normal and efficient
Binary choice
Probit and logit
• requires strict distributional assumptions
MLE
Marginal effects
• for example, it is inconsistent
Measures of fit
Application
Heteroscedasticity
◦ and highly sensitive to heteroscedasticity
Simulation
◦ and insensitive to misspecification of symmetric unimodal
⊲ Semiparametrics
MSC distribution function
Single index
Semiparametric LS
Klein and Spady
Implementation Semiparametric estimation
Average derivative
Outlook
• methods of estimation that do not rely
Density estimation
on parametric assumptions about the shape
Regression est.
of the error term distribution
Semiparametrics
Discrete choice
Censored data
Final thoughts
Econometrics Slide 47
Maximum score estimation
Introduction
Maximum score estimator (MSE) by Manski (1985)
Estimation
Binary choice
n
1X
Binary choice
β̂nM SE
Probit and logit
MLE = arg maxβ yi I(x⊤
i β ≥ 0) + (1 − yi )I(x ⊤
i β < 0)
Marginal effects n
Measures of fit
i=1
Xn
Application
1
Heteroscedasticity
= arg minβ |yi − I(x⊤
i β > 0)|
Simulation
n
Semiparametrics i=1
⊲ MSC
Single index
Semiparametric LS
Klein and Spady • weak distributional assumptions (med(εi |xi ) = 0)
Implementation
Average derivative applicable under any F and unobserved heteroscedasticity
Outlook
• identification up to a scale as in probit, estimation of
Density estimation
med(yi |xi ) = medI(x⊤i β + ε i > 0) = I(x ⊤ β > 0)
i
Regression est.
Introduction
Single-index model:
Estimation
Binary choice
Binary choice
E(Yi |Xi = x) = g(x⊤ β)
Probit and logit
MLE
Marginal effects • covers linear models, binary-choice models, ...
Measures of fit
Application • restricts the form of heteroscedasticity
Heteroscedasticity
Simulation
Semiparametrics
Identification conditions, assuming unknown g : R → R
MSC
⊲ Single index • g is differentiable and not constant on the support of Xi⊤ β
Semiparametric LS
Klein and Spady • Xi has continuously distributed components and its support is
not contained in any proper linear subspace of Rp
Implementation
Average derivative
Outlook
• no intercept and β1 = 1 (location and scale normalization):
Density estimation
g ∗ (x⊤ β) = g(γ + δ · x⊤ β) if g ∗ (t) = g(γ + δt)
Regression est.
Semiparametrics
• coefficient values of discrete variables cannot divide the support
Discrete choice of Xi⊤ β into disjoint subsets (otherwise, g must not be periodic)
Censored data
Final thoughts
Econometrics Slide 49
Semiparametric LS
Introduction
Semiparametric least squares: Ichimura (1993)
Estimation
Nonlinear least squares for E(yi |xi ) = g(x⊤
i β): g is known
Binary choice
Binary choice n
X
2
Probit and logit
MLE
min {yi − g(x⊤
i β)}
Marginal effects
β∈B
i=1
Measures of fit
Application
Heteroscedasticity Semiparametric least squares: g is unknown
Simulation
Semiparametrics • estimate the regression function
MSC
Single index g(x⊤
i β) = E(Y i |X ⊤ β = x⊤ β) = E(Y |x⊤ β) by ĝ (x⊤ β)
i i i i n i
⊲ Semiparametric LS
Klein and Spady • maximize the sum of least squares to get β̂n from
Implementation
Average derivative
n
X
Outlook
2
Density estimation
min {yi − ĝn (x⊤
i β)}
β∈B
Regression est.
i=1
Semiparametrics
Discrete choice
and then estimate ĝn (x⊤
i β̂n )
Censored data
Final thoughts
Econometrics Slide 50
Klein and Spady
Introduction
Klein and Spady (1993): estimate F and maximize likelihood based
Estimation
on the estimated distribution function F̂n
Binary choice
Binary choice
Probit and logit
• binary response: F (Xi⊤ β) = P (Yi = 1|Xi⊤ ) = E(Yi |Xi⊤ β)
MLE
Marginal effects
• parametric log-likelihood function:
Measures of fit
Application n
X
Heteroscedasticity
Simulation
ln Ln (β) = [yi ln F (x⊤ ⊤
i β) + (1 − yi ) ln{1 − F (xi β)}]
Semiparametrics i=1
MSC
Single index
Semiparametric LS
⊲ Klein and Spady
• estimate F (x⊤ ⊤ ⊤
i β) = P (Yi = 1|xi β) = E(Yi |xi β)
Implementation by F̂n (x⊤
i β), maximize likelihood wrt. β
Average derivative
Outlook
n
X
[yi ln F̂n (x⊤ ⊤
Density estimation
Regression est.
i β) + (1 − yi ) ln{1 − F̂n (xi β)}]
i=1
Semiparametrics
Censored data
i β̂n )
Final thoughts
Econometrics Slide 51
Implementation: Binary-choice model
Introduction
Introduction
Introduction
Density estimation
# assume the data are
Regression est. # Y is n x 1 vector for dependent variable
Semiparametrics # X is n x p matrix for explanatory variables
Discrete choice z <- optim(double(p-1), KS, x=X, y=Y, method="BFGS")
Censored data print(c("Parameter estimates:",z$par))
Final thoughts
Econometrics Slide 54
Implementation: Klein and Spady
Introduction
Final thoughts
Econometrics Slide 55
Average derivative
Estimation
(Powell, Stock, and Stoker, 1989; Härdle and Stoker, 1989)
Binary choice
Binary choice
Probit and logit • denoting m(x) = E(Yi |Xi = x) and f the density of Xi
MLE
Marginal effects
Measures of fit ′ ∂g(x ⊤ β)
′ ⊤
′
Application m (x) = = g (x β)β ⇒ E m (Xi ) = γβ
Heteroscedasticity ∂x
Simulation
Semiparametrics
ˆ ˆ
MSC
Single index
E{m′ (Xi )} = m′ (x)f (x)dx = − m(x)f ′ (x)dx
Semiparametric LS
′
′
f (x) f (Xi )
ˆ
Klein and Spady
Implementation = − m(x) f (x)dx = −E Yi
⊲ Average derivative f (x) f (Xi )
Outlook
Density estimation
• estimate f and f ′ by kernel density estimator
Regression est.
n
X yi fˆn (xi )
Semiparametrics ′
Discrete choice c =
γβ −
n
Censored data
i=1 nfˆn (xi )
Final thoughts
Econometrics Slide 56
Outlook
Introduction
What are benefits of semiparametric procedures
Estimation
Binary choice
• Estimation under less restrictive assumptions
Binary choice
Probit and logit
• Ability to compute probabilities without distributional
MLE
assumptions: estimate
Marginal effects
Measures of fit
Application
Heteroscedasticity
F (Xi⊤ β) = P (Yi = 1|Xi⊤ β) = E(Yi |Xi⊤ β)
Simulation
Semiparametrics
MSC • Ability to compute marginal effects:
Single index
Semiparametric LS
estimate
Klein and Spady
Implementation
′ ∂P (Yi = 1|Xi⊤ β) ∂E(Yi |Xi⊤ β)
Average derivative
F (Xi⊤ β) = =
⊲ Outlook ∂Xi ∂Xi
Density estimation
Censored data
Final thoughts
Econometrics Slide 57
Outlook
Introduction
probit chd age | z <- glm(chd ~ age,
Estimation
predict ind, xb; | family=binomial(link="probit"))
Binary choice
Binary choice predict prob, pr | r <- locpoly(z$fitted.values, chd)
Probit and logit
MLE
lpoly chd ind, ci | lines(r)
Marginal effects addplot((line prob ind, sort))
Measures of fit
Application
Single index
Semiparametric LS
.6
Average derivative
⊲ Outlook
.2
Density estimation
0
Regression est.
−2 −1 0 1
Linear prediction
Semiparametrics
95% CI Evidence of coronary heart disease
Discrete choice Local smoother Pr(chd)
kernel = epanechnikov, degree = 0, bandwidth = .5, pwidth = .76
Censored data
Final thoughts
Econometrics Slide 58
Application
Introduction
Gerfin (1996) Parametric and semiparametric estimation of the
Estimation
binary-response models of labor market participation. Journal of
Binary choice
Binary choice Applied Econometrics 11, 321–339.
Probit and logit
MLE • labor force participation of Swiss and German women
Marginal effects
Measures of fit • parametric and semiparametric estimators compared
Application
Heteroscedasticity
Simulation
Semiparametrics
MSC
Single index
Semiparametric LS
Klein and Spady
Implementation
Average derivative
⊲ Outlook
Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Econometrics Slide 59
Introduction
Estimation
Binary choice
⊲ Density estimation
Introduction
Motivation
Histogram
Local histogram
Kernel estimator
Related methods Nonparametric density
Kernel and band.
Bias and variance estimation
Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
Confidence bands
Testing
Multivariate density
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Econometrics Slide 60
Introduction
Introduction
• Parametric regression:
Estimation
Semiparametrics
◦ regression estimation
Discrete choice ◦ curse of dimensionality
Censored data
Final thoughts
Econometrics Slide 61
Motivation
Introduction
Probability density function can
Estimation
Binary choice
• capture and demonstrate stylized facts
Density estimation (e.g., development of income distribution)
Introduction
⊲ Motivation • describe an unknown distribution
Histogram
Local histogram
(e.g., of an estimation procedure in finite samples)
Kernel estimator
Related methods
• help in parametric inference
Kernel and band. (e.g., asymptotic variance of LAD depends on f (0))
Bias and variance
Bandwidth choice
Plug-in methods
Asymptotics • conditional moment
´ estimation:
Confidence intervals
Confidence bands E(Y |X = x) = yf (x, y)/f (x)dy in regression)
Testing
Multivariate density • conditional distribution function estimation:
Regression est. P (Y ≤ t|X = x) = E[I(Y ≤ t)|X = x)
Semiparametrics • derivative of density function
Discrete choice
by differentiating a density estimator
Censored data
Final thoughts
Econometrics Slide 62
Motivation
Introduction
Parametric approach
Estimation
Binary choice
• assume form parametrized by a number of parameters
Density estimation ( 2 )
Introduction 1 1 x−µ
⊲ Motivation f (x|µ, σ) = √ exp −
Histogram
2πσ 2 σ
Local histogram
Kernel estimator
Related methods
Kernel and band. • estimate µ and σ
• set fˆ(x) = f (x|µ̂, σ̂)
Bias and variance
Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
Confidence bands Nonparametric approach
Testing
Multivariate density
• do not assume specific form or parameters
Regression est.
• impose smoothness of the density function
Semiparametrics
Discrete choice
• estimate a general density function
Censored data • example: histogram
Final thoughts
Econometrics Slide 63
Net income example
Introduction
Net income in the U.K. from 1969 to 1983:
Estimation
noparametric and parametric density estimates
Binary choice
Density estimation
Introduction
⊲ Motivation
Histogram
Kernel density Log-normal density
Local histogram
Kernel estimator
Related methods
Kernel and band.
Bias and variance
Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
Confidence bands
Testing
Multivariate density 1981
1979 1981
1977 1979
Regression est. 1975 1977
1973 1975
1971 1973
Semiparametrics
1971
Discrete choice
Censored data
Final thoughts
Econometrics Slide 64
Histogram
Introduction
Estimate density f (observations x1 , . . . , xn ∼ F )
Estimation
Confidence bands
Y*E-2
Testing
Multivariate density
Regression est.
5
Semiparametrics
Discrete choice
Censored data
0
Final thoughts
0 5 10 15
X
Econometrics Slide 65
Histogram – properties
Introduction
Mathematical explanation
Estimation
Binary choice
• probability of “falling” into interval Ij = [x0 + (j − 1)h, x0 + jh]
Density estimation
ˆ
Introduction P (X ∈ Ij ) = f (x)dx ≈ f {x0 + (j − 1/2)h}h
Motivation
⊲ Histogram
Ij
Local histogram
Kernel estimator
Related methods
Kernel and band.
• estimate of the density
Bias and variance
Bandwidth choice X n
1 1
Plug-in methods
Asymptotics
fˆ{x0 + (j − 1/2)h} ≈ P (X ∈ Ij ) ≈ I(xi ∈ Ij )
Confidence intervals
h hn
i=1
Confidence bands
Testing
Multivariate density Properties
Regression est.
• step function
Semiparametrics
Discrete choice
• bias ∼ h: fˆ{x0 + (j − 1/2)h} used for all x ∈ Ij
Censored data • variance ∼ 1/nh: all data in Ij used
Final thoughts • dependence on origin and on bin width h
Econometrics Slide 66
Histogram: simulated example
Introduction
100 histograms for 500 observations simulated from N (0, 1)
Estimation
Binary choice
0.6
0.6
Motivation
⊲ Histogram
0.5
0.5
Local histogram
0.4
0.4
Kernel estimator
Y
Y
0.3
0.3
Related methods
0.2
0.2
Kernel and band.
Bias and variance
0.1
0.1
Bandwidth choice
0
0
Plug-in methods -5 0 5 -5 0 5
X X
Asymptotics
Confidence intervals h=1.0 h=2.0
Confidence bands
Testing
0.6
0.6
Multivariate density
0.5
0.5
Regression est.
0.4
0.4
Y
Y
0.3
0.3
Semiparametrics
0.2
0.2
Discrete choice
0.1
0.1
Censored data
0
-5 0 5 -5 0 5
Final thoughts X X
Econometrics Slide 67
Local histogram
Introduction
• use interval around any given x: (x − h/2, x + h/2)
Estimation
• estimate
Binary choice
Xn
1
Density estimation
fˆh (x) = I(x − h/2 ≤ xi ≤ x + h/2)
Introduction
nh
Motivation i=1
Histogram
Xn
⊲ Local histogram 1 1 xi − x 1
Kernel estimator = I − ≤ ≤
Related methods nh 2 h 2
Kernel and band. i=1
Bias and variance
Bandwidth choice Properties (strictly speaking, we should write hn and fˆhn ,n ):
Plug-in methods
Regression est.
′ F (x + h/2) − F (x − h/2)
f (x) = F (x) = lim
Semiparametrics
h→0 h
Discrete choice
P (x − h/2 ≤ Xi ≤ x + h/2)
Censored data = lim
h→0 h
Final thoughts
Econometrics Slide 68
Kernel estimator
Introduction
(Local) histogram not smooth ⇒ replace indicator by a smooth
Estimation
function
Binary choice
Econometrics Slide 69
Related methods
Introduction
• Derivative estimation:
ˆ(s) Pn
Estimation s s+1
fh (x) = (−1) /(nh ) i=1 K (s) {(xi − x)/h}
Binary choice
Density estimation
• Variable bandwidth: each point xi has it own bandwidth hin
Introduction
Motivation
• K th nearest neighbor:
Histogram
n
X
Local histogram
1 xi − x
Kernel estimator
fˆk (x) = K ,
⊲ Related methods
ndk (x) dk (x)
Kernel and band. i=1
Bias and variance
Bandwidth choice
Plug-in methods where dk (x) = distance of x and its k th nearest neighbor
Asymptotics
Confidence intervals • Series estimation: express continuous density as
Confidence bands
Testing
Multivariate density
J
X
Regression est. fˆJ (x) = aj gj (x)
Semiparametrics j=1
Discrete choice
Censored data
for some orthogonal functions gj (x)
Final thoughts (e.g., gj (x) = Hj (x) or φ(x)xj )
Econometrics Slide 70
Kernel functions
Introduction
Examples of various kernel functions
Estimation
Binary choice
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Econometrics Slide 71
Kernel functions – graphs
Introduction
Plots of several kernel functions with support [−1, 1]
Estimation
Binary choice
Density estimation
Introduction
Motivation Uniform Epanechnikov
Histogram
1
1
Local histogram
Kernel estimator
K(x)
K(x)
Related methods
0.5
0.5
⊲ Kernel and band.
Bias and variance
Bandwidth choice
0
0
Plug-in methods
-1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5
Asymptotics x x
Confidence intervals
Triangle Quartic
Confidence bands
Testing
1
1
Multivariate density
Regression est.
K(x)
K(x)
0.5
0.5
Semiparametrics
Discrete choice
0
Censored data -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5
x x
Final thoughts
Econometrics Slide 72
Kernel choice
Introduction
Estimated density of stock returns with different kernels
Estimation
(Pagan and Schwert, 1990; monthly US data 1834–1925)
Binary choice
Density estimation
Introduction
Motivation
Histogram Uniform kernel, h=0.015 Epanechnikov kernel, h=0.015
Local histogram
Kernel estimator
10
10
Related methods
⊲ Kernel and band.
fh
fh
Bias and variance
5
5
Bandwidth choice
Plug-in methods
0
0
Asymptotics
Confidence intervals -0.15 -0.1 -0.05 0 0.05 0.1 0.15 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
x x
Confidence bands
Testing Triangle kernel, h=0.015 Quartic kernel, h=0.015
Multivariate density
Regression est.
10
10
Semiparametrics
fh
fh
5
5
Discrete choice
Censored data
0
Final thoughts -0.15 -0.1 -0.05 0 0.05 0.1 0.15 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
x x
Econometrics Slide 73
Bandwidth choice
Introduction
Estimated density of stock returns with different bandwidths
Estimation
(Pagan and Schwert, 1990; monthly US data 1834–1925)
Binary choice
Density estimation
Introduction
Motivation
Histogram Epanechnikov kernel, h=0.005 Epanechnikov kernel, h=0.01
Local histogram
Kernel estimator
15
15
Related methods
⊲ Kernel and band.
10
10
fh
fh
Bias and variance
Bandwidth choice
5
5
Plug-in methods
Asymptotics
0
0
Confidence intervals -0.15 -0.1 -0.05 0 0.05 0.1 0.15 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
x x
Confidence bands
Testing Epanechnikov kernel, h=0.025 Epanechnikov kernel, h=0.025
Multivariate density
15
15
Regression est.
10
10
Semiparametrics
fh
fh
Discrete choice
5
Censored data
0
Final thoughts -0.15 -0.1 -0.05 0 0.05 0.1 0.15 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
x x
Econometrics Slide 74
Bandwidth choice – simulations
Introduction
100 density estimates for 500 observations simulated from N (0, 1)
Estimation
Binary choice
0.6
0.6
Motivation
Histogram
0.5
0.5
Local histogram
0.4
0.4
Kernel estimator
Y
Y
0.3
0.3
Related methods
0.2
0.2
⊲ Kernel and band.
Bias and variance
0.1
0.1
Bandwidth choice
0
0
Plug-in methods -5 0 5 -5 0 5
X X
Asymptotics
Confidence intervals h=0.80 h=3.20
Confidence bands
Testing
0.6
0.6
Multivariate density
0.5
0.5
Regression est.
0.4
0.4
Y
Y
0.3
0.3
Semiparametrics
0.2
0.2
Discrete choice
0.1
0.1
Censored data
0
-5 0 5 -5 0 5
Final thoughts X X
Econometrics Slide 75
Density estimation – assumptions
Introduction
Kernel estimator of density f
Estimation
n
X n
X
Binary choice
1 x i − x 1
Density estimation fˆh (x) = K = wni (x)
Introduction nh h nh
Motivation
i=1 i=1
Histogram
Local histogram • observations x1 , . . . , xn
Kernel estimator
Related methods • kernel K is symmetric around zero and
Kernel and band.
⊲ Bias and variance
´
Bandwidth choice
◦ K(t)dt = 1
Plug-in methods
´ 2
Asymptotics
◦ t K(t)dt = µ2 6= 0
´ 2
Confidence intervals
Confidence bands
◦ K (t)dt = kKk2 < ∞
Testing
Multivariate density
• h = hn → 0 as n → ∞
Regression est.
Semiparametrics
• nhn → ∞ as n → ∞
Discrete choice
Econometrics Slide 76
Exact bias and variance
Introduction
• Bias (substitution t = (u − x)/h)
Estimation
Binary choice
1 xi − x
Density estimation E[fˆh (x) − f (x)] = E K − f (x)
Introduction h h
Motivation
1 u−x
ˆ
Histogram
Local histogram
= K f (u)du − f (x)
Kernel estimator
h h
Related methods
ˆ
Kernel and band.
⊲ Bias and variance
= K(t){f (x + th) − f (x)}dt
Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
• Variance (varZ = E(Z 2 ) − [E(Z)]2 )
Confidence bands
Testing
ˆ 1 1 xi − x
Multivariate density
var[fh (x)] = var K
Regression est. n h h
ˆ 2
1 1
Semiparametrics
ˆ
2
Discrete choice = K (t)f (x + th)dt − K(t)f (x + th)dt
nh n
Censored data
Final thoughts
Econometrics Slide 77
Asymptotic bias and variance
Introduction
Using the Taylor expansion
Estimation
1
Binary choice f (x + th) = f (x) + thf (x) + (th)2 f ′′ (x) + · · ·
′
Density estimation 2
Introduction
Motivation • bias up to O(h2 )
Histogram
Local histogram
ˆ
Kernel estimator
Related methods
E[fˆh (x) − f (x)] = K(t){f (x + th) − f (x)}dt
Kernel and band.
1
ˆ
⊲ Bias and variance
Bandwidth choice = K(t){thf (x) + (th)2 f ′′ (x)}dt
′
Plug-in methods 2
Asymptotics ˆ 2 ′′
h f (x)
ˆ
′
K(t)t2 dt
Confidence intervals
Confidence bands = hf (x) K(t)tdt +
Testing 2
Multivariate density
Regression est. h 2 ˆ
h 2
bias[fˆh (x)] = ′′ 2
f (x) t K(t)dt = f ′′ (x)µ2 (K)
Semiparametrics
2 2
Discrete choice
Censored data
Final thoughts
Econometrics Slide 78
Example – bias vs. density
Introduction
Density = mixture of N (0, 1) (0.3) and t1 − 3 (0.7)
Estimation
Binary choice
Density estimation
Introduction
Density (dashed) and bias effect (solid)
Motivation
Histogram
Local histogram 20
Kernel estimator
Related methods
Kernel and band.
⊲ Bias and variance
15
Density and bias*E-2
Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
10
Confidence bands
Testing
Multivariate density
Regression est.
5
Semiparametrics
Discrete choice
Censored data
Final thoughts -6 -4 -2 0 2
x
Econometrics Slide 79
Asymptotic bias and variance
Introduction
Using the Taylor expansion
Estimation
1
Binary choice f (x + th) = f (x) + thf (x) + (th)2 f ′′ (x) + · · ·
′
Density estimation 2
Introduction
Motivation • bias is up to O(h2 )
Histogram
Local histogram
Kernel estimator
h2 h2
ˆ
Related methods
Kernel and band. bias[fˆh (x)] = ′′
f (x) 2
t K(t)dt = µ2 (K)f ′′ (x)
⊲ Bias and variance 2 2
Bandwidth choice
Plug-in methods
Asymptotics • variance is up to O(1/nh)
Confidence intervals
1 1
Confidence bands
ˆ
Testing var[fˆh (x)] = f (x) K 2 (t)dt = kKk2 f (x)
Multivariate density nh nh
Regression est.
Introduction
Density = mixture of N (0, 1) (0.3) and N (−3, 1) (0.7)
Estimation
Binary choice
Squared bias (solid), variance (dashed), and MSE (thick)
Density estimation
Introduction
Motivation
15
Histogram
Local histogram
Kernel estimator
Related methods
Bias, variance and MSE*E-4
Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
Confidence bands
Testing
5
Multivariate density
Regression est.
Semiparametrics
Discrete choice
0
Censored data
2 4 6 8 10
Final thoughts Bandwidth*E-2
Econometrics Slide 81
Bandwidth choice
Introduction
Bias-variance trade-off (see simulation)
Estimation
Binary choice
• MSE pointwise only
Introduction
Optimal bandwidth h = minimal error
Estimation
Binary choice
hopt = arg minh AM ISE(fˆh )
Density estimation
Introduction
Motivation
Histogram • optimal bandwidth
Local histogram
1/5
Kernel estimator
kKk2
Related methods
hopt = ∼ n−1/5
Kernel and band.
Bias and variance
kf ′′ k2 µ22 (K)n
⊲ Bandwidth choice
Plug-in methods
Asymptotics • optimal error AM ISE ∼ n−4/5 (histogram: n−2/3 )
Confidence intervals
Confidence bands • kf ′′ k2 unknown
Testing
Multivariate density
Kernel choice by minimizing AMISE
Regression est.
Econometrics Slide 83
Plug-in methods
Introduction
Plug-in methods = assume normality
Estimation
Binary choice
• Silverman’s rule of thumb: f = φ{(x − µ)/σ}/σ
Density estimation ⇒ hROT = 1.06σ̂n−1/5 for Gaussian kernel
Introduction
Motivation • Park and Marron plug-in estimator:
Histogram
Local histogram
Kernel estimator
◦ estimate f ′′ (x) by kernel density estimation
Related methods
n
X
Kernel and band.
ˆ′′ 1 ′′ xi − x
Bias and variance fhROT (x) = K ,
Bandwidth choice nh3ROT i=1 hROT
⊲ Plug-in methods
Asymptotics
Confidence intervals ◦ use bias correction
Confidence bands
\ ′′ k2 = kfˆ′′ k2 −
1
kK ′′ k2
Testing
kf
nh5ROT
Multivariate density
Regression est.
Introduction
Beware: Stata example for the car weight data
Estimation
kdensity weight, kernel(epanechnikov) generate(x epan)
Binary choice
kdensity weight, kernel(parzen) generate(x2 parzen)
Density estimation
Introduction line epan parzen x, sort ytitle(Density) legend(cols(1))
Motivation
Histogram
.0008
Local histogram
Kernel estimator
Related methods
Kernel and band. .0006
Bias and variance
Bandwidth choice
Density
.0004
⊲ Plug-in methods
Asymptotics
Confidence intervals
.0002
Confidence bands
Testing
Multivariate density
0
Regression est.
1000 2000 3000 4000 5000
Weight (lbs.)
Semiparametrics
Epanechnikov density estimate
Discrete choice Parzen density estimate
Censored data
Final thoughts
Econometrics Slide 85
Example: Car weights
Introduction
Beware: Stata example for the car weight data
Estimation
kdens weight, kernel(epanechnikov) generate(epan x) bw(sjpi)
Binary choice
kdens weight, kernel(parzen) generate(parzen x) bw(sjpi)
Density estimation
Introduction line epan parzen x, sort ytitle(Density) legend(cols(1))
Motivation
Histogram
.0005
Local histogram
Kernel estimator
.0004
Related methods
Kernel and band.
Bias and variance
.0002 .0003
Bandwidth choice
Density
⊲ Plug-in methods
Asymptotics
Confidence intervals
Confidence bands
.0001
Testing
Multivariate density
0
Regression est.
1,000 2,000 3,000 4,000 5,000
Weight (lbs.)
Semiparametrics
Epanechnikov density estimate
Discrete choice Parzen density estimate
Censored data
Final thoughts
Econometrics Slide 86
Example: Car weights
Introduction
kdensity weight, nograph generate(x fx)
Estimation
kdensity weight if foreign==0, nograph generate(fx0) at(x)
Binary choice
kdensity weight if foreign==1, nograph generate(fx1) at(x)
Density estimation
Introduction line fx0 fx1 x, sort ytitle(Density)
Motivation
Histogram
.001
Local histogram
Kernel estimator
Related methods
Kernel and band. .0008
Bias and variance
.0006
Bandwidth choice
Density
⊲ Plug-in methods
Asymptotics
.0004
Confidence intervals
Confidence bands
.0002
Testing
Multivariate density
Regression est.
0
Censored data
Final thoughts
Econometrics Slide 87
Asymptotics – assumptions
Introduction
Provided that kernel K and density f satisfy additionally
Estimation
Semiparametrics
Discrete choice
Censored data
Final thoughts
Econometrics Slide 88
Asymptotics – consistency and normality
Introduction
Kernel density estimator is (hn → 0 and nhn → ∞)
Estimation
P
Binary choice • pointwise consistent: M SE[fˆh ] → 0 and fˆh → f
Density estimation
Introduction
• uniformly consistent under some regularity conditions and
P
Motivation
Histogram
nh2n → ∞: supx∈R |fˆh (x) − f (x)| → 0
Local histogram
Kernel estimator
• asymptotically normal (pointwise):
Related methods √
Kernel and band.
Bias and variance
nh{fˆh (x) − Efˆh (x)} → N 0, f (x)kKk2
Bandwidth choice
Plug-in methods √ √
⊲ Asymptotics
ˆ
(the same applies to nh(fh − f ) if nhh2 → 0,
which does not hold for hopt , but for undersmoothing)
Confidence intervals
Confidence bands
Regression est.
√ c2
Semiparametrics
nh{fˆh (x) − f (x)} → N f ′′ (x)µ2 (K), f (x)kKk2
Discrete choice 2
Censored data √
Final thoughts • optimal rate of convergence: hopt ∼ n−1/5 ⇒ nh ∼ n2/5
Econometrics Slide 89
Confidence intervals
Introduction
Asymptotic normality ⇒ pointwise confidence intervals
Estimation
Binary choice
• asymptotic confidence interval
Density estimation
α
−1/2
Φ−1 1−
Introduction
ˆ
Motivation fˆh (x) ± √ 2
fˆh (x) K 2 (x)dx
Histogram
Local histogram
nh
Kernel estimator
Related methods
under undersmoothing (h ∼ n−1/5−δ , δ > 0)
Kernel and band.
Bias and variance
Bandwidth choice
• finite samples
Plug-in methods
Asymptotics ◦ undersmooth (see above)
⊲ Confidence intervals
Confidence bands ◦ estimate bias (difficult in small samples)
Testing
Multivariate density
α
−1
h2 Φ−1 1−
ˆ
Regression est.
fˆh (x)− fc′′
h (x)µ2 (K)± √ 2
fˆh (x) K 2 (x)dx
Semiparametrics 2 nh
Discrete choice
Econometrics Slide 90
Confidence bands
Introduction
• confidence intervals – pointwise
Estimation
• confidence bands – interval or R wide
Binary choice
Density estimation
Introduction
◦ available under restrictive assumptions
Motivation (undersmoothing, f on interval (0, 1))
Histogram
Local histogram
" #1/2 1/2
Kernel estimator
ˆ(x)kKk2
f z
fˆ(x)±
Related methods
Kernel and band.
1/2
+ dn
Bias and variance nh {2(1/5 + δ) log(n)}
Bandwidth choice
Plug-in methods
Asymptotics with coverage probability 1 − α = exp[−2 exp(z)] and
Confidence intervals
⊲ Confidence bands dn = {2(1/5 + δ) log(n)}1/2 [1 + log{kK ′ k2 /2πkKk2 }]
Testing
Multivariate density
Regression est.
• confidence bands typically wider than confidence intervals
Semiparametrics
Discrete choice
Censored data
Final thoughts
Econometrics Slide 91
Example: CPS 1985
Introduction
Income distribution in the USA (CPS 1985)
Estimation
Test statistics: T = 57.209 > 2.32 = Φ−1 (0.99)
Binary choice
Plug-in methods
Asymptotics
5
Confidence intervals
⊲ Confidence bands
Testing
Multivariate density
Regression est.
Semiparametrics
Discrete choice
0
Censored data
Final thoughts 0 10 20 30
Income
Econometrics Slide 92
Example: CPS 1985
Introduction
Income distribution in the USA (CPS 1985)
Estimation
kdens wagelog, ci normal bw(oversmooth)
Binary choice
Density estimation
.25
Introduction
Motivation
Histogram
.2
Local histogram
Kernel estimator
Related methods .15
Density
Bandwidth choice
Plug-in methods
Asymptotics
.05
Confidence intervals
⊲ Confidence bands
Testing
0
Multivariate density
−2 0 2 4 6 8
Regression est. wlog
Censored data
Final thoughts
Econometrics Slide 93
Testing
Introduction
Testing H0 : f = g vs. H1 : f 6= g for a known g(x, θ)
Estimation
Censored data
Final thoughts
Econometrics Slide 94
Example: CPS 1985
Introduction
Income distribution in the USA (CPS 1985)
Estimation
Test statistics: T = 57.209 > 2.32 = Φ−1 (0.99)
Binary choice
Plug-in methods
Asymptotics
5
Confidence intervals
Confidence bands
⊲ Testing
Multivariate density
Regression est.
Semiparametrics
Discrete choice
0
Censored data
Final thoughts 0 10 20 30
Income
Econometrics Slide 95
Multivariate density estimation
Estimation
Regression est.
• most general (H ∈ Rd×d )
Semiparametrics
X n
Discrete choice
ˆ 1
fH (x) = K{H −1 (x − xi )}
Censored data ndet(H)
i=1
Final thoughts
Econometrics Slide 96
Multivariate density estimation – properties
Introduction
For a symmetric kernel with second moments and norm
´ ´
Estimation
( K(x)dx = 1, xK(x)dx = 0,
2
Binary choice
´ ⊤
´ 2
µ2 (K) = xx K(x)dx, kKk = K (x)dx)
Density estimation
Introduction
Motivation
• bias (H(f ) = Hessian matrix of f ) [result for H = hId ]
Histogram 2
1 h
bias[fˆH (x)] ≈ µ2 (K)tr(H ⊤ H(f )H) =
Local histogram
Kernel estimator µ2 (K)tr(H(f ))
Related methods 2 2
Kernel and band.
Bias and variance
Bandwidth choice • variance [result for H = hId ]
Plug-in methods
Asymptotics
1 1
Confidence intervals var[fˆH (x)] ≈ kKk2 f (x) = d
kKk 2
f (x)
Confidence bands ndet(H) nh
Testing
⊲ Multivariate density
Introduction
Joint density of income and age in east Germany, 1991
Estimation
Binary choice
Density estimation
Introduction
Motivation Age-income density estimate
Histogram
Local histogram
Kernel estimator
Related methods
Kernel and band.
Bias and variance
Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
Confidence bands
Testing
⊲ Multivariate density
Regression est.
Semiparametrics 3426
2917
2408
Discrete choice
1898
1389
Censored data 880
Final thoughts 25 32 39 46 52 59
Econometrics Slide 98
Introduction
Estimation
Binary choice
Density estimation
⊲ Regression est.
Cond. moments
Nonpar. regression
Various estimators
Local linear reg. Nonparametric regression
Local polynomial
Example
Assumptions
estimation
Nadaraya-Watson
Local linear reg.
Bandwidth choice
Cross validation
Asymptotics
Confidence intervals
Testing
Examples
Multivariate reg.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Econometrics Slide 99
Conditional moments
Introduction
Estimation of conditinal moments
Estimation
Binary choice
• regression
Density estimation
◦ dependent variable Y (e.g., earnings)
Regression est.
⊲ Cond. moments ◦ explanatory variables X (e.g., age, education)
Nonpar. regression
Various estimators
Local linear reg.
Local polynomial
yi = m(xi ) + εi
Example
Assumptions
E(Y |X) = m(X)
Nadaraya-Watson
Local linear reg.
ln Earnings = m(Age, Education) + ε
Bandwidth choice
Cross validation
Asymptotics • conditional variance
Confidence intervals
Testing
Examples E[{Yi − E(Yi |Xi )}2 |Xi ] = E(Yi2 |Xi ) − [E(Yi |Xi )]2
Multivariate reg.
Semiparametrics
Discrete choice
• conditional probability
Censored data
Final thoughts
P (Yi = 1|Xi ) = 1P (Yi = 1|Xi )+0P (Yi = 0|Xi ) = E(Yi |Xi )
Econometrics Slide 100
Univariate regression
Introduction
Estimation idea with explanatory variable
Estimation
Binary choice
• discrete (bandwidth 1 ≫ h)
Density estimation
n
X I(xi = x)yi
Regression est.
Ên (Yi |Xi = x) = Pn
j=1 I(xj = x)
Cond. moments
⊲ Nonpar. regression i=1
Various estimators
Xn
Local linear reg. I(x − h < xi < x + h)yi
Local polynomial = Pn
j=1 I(x − h < xj < x − h)
Example
Assumptions i=1
Nadaraya-Watson
Xn
Local linear reg. I[|x − xi |/h < 1]yi
Bandwidth choice = Pn
Cross validation
i=1 j=1 I[|x − xj |/h < 1]
Asymptotics
Confidence intervals
Testing
Examples
• continuous (kernel K , bandwidth h)
Multivariate reg.
n
X
Semiparametrics K{(x − xi )/h}yi
Ên (yi |x) = Pn
j=1 K{(x − xj )/h}
Discrete choice
Censored data
i=1
Final thoughts
Econometrics Slide 101
Nonparametric regression
Introduction
Regression model y = E(y|x) + ε = m(x) + ε
Estimation
f (y, x)
Binary choice
ˆ ˆ
Density estimation m(x) = E(y|x) = yf (y|x)dy = y dy
fx (x)
Regression est.
Cond. moments
⊲ Nonpar. regression • joint density f (y, x), x ∈ Rp
Various estimators
n
X
Local linear reg. 1 yi − y xi − x
Local polynomial fˆ(y, x) = Ky Kx
Example nh′ hp h′ h
Assumptions i=1
Nadaraya-Watson
Local linear reg. • marginal density fx (x)
Bandwidth choice n
X
Cross validation
ˆ 1 xi − x
Asymptotics fx (x) = Kx
Confidence intervals nhp h
Testing
i=1
Pn yi −y
Semiparametrics
ˆ 1 xi −x
f (y, x) i=1 Ky Kx h
ˆ ˆ
Discrete choice
dy = y h h′
′
m̂(x) = y Pn xi −x
dy
Censored data
fˆx (x) i=1 Kx h
Final thoughts
Econometrics Slide 102
Nonparametric regression
Introduction
Smooth regression function m(x)
Estimation
Semiparametrics Pn
Discrete choice
• general nonparametric estimator m̂h (x) = i=1 wni (x)yi
Censored data
with weights wni (x) = wn (xi , x)
Final thoughts
Econometrics Slide 103
Example – Engel curve
Introduction
Food expenditures vs. net income in the U.K., 1973
Estimation
Binary choice
1.5
Cond. moments
⊲ Nonpar. regression
Various estimators
Local linear reg.
Local polynomial
Example
1
Assumptions
Food expenditures
Nadaraya-Watson
Local linear reg.
Bandwidth choice
Cross validation
Asymptotics
0.5
Confidence intervals
Testing
Examples
Multivariate reg.
Semiparametrics
Discrete choice
0
Introduction
probit chd age
Estimation
predict prob, pr
Binary choice
lpoly chd age, addplot((line prob age, sort))
Density estimation
Regression est.
1
Various estimators
Local linear reg. .8
Local polynomial
Example
.6
Assumptions
Nadaraya-Watson
Local linear reg.
.4
Bandwidth choice
Cross validation
.2
Asymptotics
Confidence intervals
0
Testing
Examples 0 20 40 60
Multivariate reg. Age (in years)
Final thoughts
Econometrics Slide 105
Various estimators
Introduction
Pn
General nonparametric estimator m̂h (x) = i=1 wni (x)yi
Estimation
• Nadaraya-Watson:
Binary choice
Pn
Density estimation wni (x) = K{(xi − x)/h}/ j=1 K{(xj − x)/h}
Regression est.
• Variable bandwidth estimation:
Cond. moments Pn
Nonpar. regression wni (x) = K{(xi − x)/hi }/ j=1 K{(xj − x)/hj }
⊲ Various estimators
Local linear reg.
Local polynomial
Example • The k th nearest neighbor estimator:
Assumptions
Nadaraya-Watson
wni (x) = Iki /k = I(xi = kth nearest to x)/k or
Local linear reg.
Bandwidth choice
wni (x) = Iki wk
Cross validation
Asymptotics
Confidence intervals • Known density fx (x):
Testing
Examples wni (x) = K{(xi − x)/h}/[(nhp )f1 (x)]
Multivariate reg.
• Fixed design
´ (Gasser-Müller estimator):
Semiparametrics si
Discrete choice
wni (x) = si−1 K{(t − x)/h}/h ≈ (si − si−1 )K[(x − ξ)/h]
Censored data
Final thoughts
Econometrics Slide 106
Local linear regression
Introduction
• Nadaraya-Watson minimizes (b0 (x) = m(x), verify)
Estimation
n
X
Binary choice
Density estimation
{yi − b0 (x)}2 K{(xi − x)/h}
Regression est.
i=1
Cond. moments
Nonpar. regression • Local linear regression – minimize
Various estimators
⊲ Local linear reg. n
X
Local polynomial
Example
{yi − b0 (x) − b1 (x)(xi − x)}2 K{(xi − x)/h}
Assumptions i=1
Nadaraya-Watson
Local linear reg. ◦ at given x, b0 (x), b1 (x) regression constants
Bandwidth choice
Cross validation ◦ weighted least squares regression (around x):
Asymptotics
Confidence intervals
Testing b̂0,h (x) = ȳ − b̂1 (x)(x̄h − x)
Examples Pn
Multivariate reg.
i=1 (yi − ȳh )(xi − x̄h )K{(xi − x)/h}
b̂1,h (x) = Pn 2 K{(x − x)/h}
Semiparametrics
j=1 (x j − x̄ h ) j
Discrete choice
Pn Pn
Censored data
for v̄h = i=1 vi K{(xi − x)/h}/ j=1 K{(xj − x)/h}
Final thoughts
Econometrics Slide 107
Local polynomial regression
Introduction
• Motivation – Taylor expansion
Estimation
Binary choice ∂m 1 ∂ pm p
m(xi ) ≈ m(x) + (x)(xi − x) + . . . + (x)(x i − x)
Density estimation ∂x p! ∂xp
Regression est.
Cond. moments
Nonpar. regression
• Local polynomial regression – minimize
Various estimators
Local linear reg. n
X
xi − x
{yi −b0 (x)−b1 (x)(xi −x)−. . .−bp (x)(xi −x)p }2 K
⊲ Local polynomial
Example
Assumptions h
i=1
Nadaraya-Watson
Local linear reg.
Pn
Bandwidth choice
Cross validation
• Weighted average of yi (m̂(x) = i=1 wni (x)yi )
Asymptotics
Discrete choice
K = diag(K{(xi − x)/h})
(j)
Censored data • Estimates of derivatives m̂h (x) = bj (x) · j!
Final thoughts
Econometrics Slide 108
Simulated example
Introduction
Nadaraya-Watson and local linear regression
Estimation
(m(x) = x + 5 sin(2x), n = 400, h = 0.8)
Binary choice
Density estimation
Regression est. Local constant (red dashed) and linear (blue solid) regression
Cond. moments
Nonpar. regression
10
Various estimators
Local linear reg.
⊲ Local polynomial
Example
Assumptions
5
Nadaraya-Watson
Local linear reg.
Y
Bandwidth choice
Cross validation
Asymptotics
0
Confidence intervals
Testing
Examples
Multivariate reg.
Semiparametrics
-5
Discrete choice
0 1 2 3 4 5
Censored data X
Final thoughts
Econometrics Slide 109
Example NW – coronary heart disease data
Introduction
probit chd age
Estimation
predict prob, pr
Binary choice
lpoly chd age, addplot((line prob age, sort))
Density estimation
Regression est.
1
Various estimators
Local linear reg. .8
⊲ Local polynomial
Example
.6
Assumptions
Nadaraya-Watson
Local linear reg.
.4
Bandwidth choice
Cross validation
.2
Asymptotics
Confidence intervals
0
Testing
Examples 0 20 40 60
Multivariate reg. Age (in years)
Final thoughts
Econometrics Slide 110
Example LLR – coronary heart disease data
Introduction
probit chd age
Estimation
predict prob, pr
Binary choice
lpoly chd age, degree(1) addplot((line prob age, sort))
Density estimation
Regression est.
1
Various estimators
Local linear reg. .8
Local polynomial
⊲ Example
.6
Assumptions
Nadaraya-Watson
Local linear reg.
.4
Bandwidth choice
Cross validation
.2
Asymptotics
Confidence intervals
0
Testing
Examples 0 20 40 60
Multivariate reg. Age (in years)
Final thoughts
Econometrics Slide 111
Example – Engel curve
Introduction
Food expenditures vs. net income in the U.K., 1973
Estimation
Binary choice
1.5
Cond. moments
Nonpar. regression
Various estimators
Local linear reg.
Local polynomial
⊲ Example
1
Assumptions
Food expenditures
Nadaraya-Watson
Local linear reg.
Bandwidth choice
Cross validation
Asymptotics
0.5
Confidence intervals
Testing
Examples
Multivariate reg.
Semiparametrics
Discrete choice
0
Censored data
0 0.5 1 1.5 2 2.5 3
Final thoughts Net income
Econometrics Slide 112
Nonparametric regression – assumptions
Introduction
Finite-sample properties
Estimation
Binary choice
• method comparison
Density estimation • bandwidth choice
Regression est.
Cond. moments Assumptions for yi = m(xi ) + εi
Nonpar. regression
Various estimators • (xi , yi ) i.i.d. sample from (x, y) ∼ f
Local linear reg.
Local polynomial • εi i.i.d. with zero mean and independent of xi
Example
⊲ Assumptions • m and f twice continuously differentiable
Nadaraya-Watson
Local linear reg. • kernel K symmetric with
Bandwidth choice
Cross validation ´ ´
Asymptotics ◦ K(x)dx = 1, xK(x)dx = 0
Confidence intervals ´ 2
Testing ◦ x K(x)dx = µ2 (K) < ∞
Examples ´ 2
Multivariate reg. ◦ K (x)dx = kKk2 < ∞
Semiparametrics
Introduction
• Bias (fx (x) > 0)
Estimation
( ′
)
h2
Binary choice
fx (x)m′ (x) 1
Density estimation bias[m̂h (x)] = ′′
m (x) + 2 +O( )+o(h2 )
Regression est.
2 fx (x) nh
Cond. moments
Nonpar. regression
Various estimators • Variance (σ 2 (x) = var(εi |xi ))
Local linear reg.
Local polynomial
Example
1 σ 2 (x) 2 1
Assumptions var[m̂h (x)] = kKk + o( )
⊲ Nadaraya-Watson nh fx (x) nh
Local linear reg.
Bandwidth choice
Cross validation • Comparison with density estimators
Asymptotics
Confidence intervals
Testing ◦ bias proportional to curvature (m′′ (x))
Examples ′
Multivariate reg. ◦ extra bias term (m′ (x)fx (x)/fx (x))
Semiparametrics
Discrete choice
Censored data
Final thoughts
Econometrics Slide 114
Local linear regression
Introduction
• Bias
Estimation
h2
Binary choice
bias[m̂h (x)] = µ2 (K)m′′ (x) + o(h2 )
Density estimation 2
• Variance (σ 2 (x) = var(εi |xi ))
Regression est.
Cond. moments
Nonpar. regression
Various estimators 1 σ 2 (x) 2 1
Local linear reg. var[m̂h (x)] = kKk + o( )
Local polynomial nh fx (x) nh
Example
Assumptions
Nadaraya-Watson
• Comparison with Nadaraya-Watson estimator
⊲ Local linear reg.
Bandwidth choice ◦ bias independent of fx (design)
Cross validation
Asymptotics ◦ no bias for linear functions m
Confidence intervals
Testing ◦ very similar to bias and variance of density estimator
Examples
Multivariate reg.
• General combination with a parametric estimator m(x, β):
Semiparametrics
Pn
Discrete choice i=1 {yi − m(x, β)}2 K{(xi − x)/h}
Censored data
Final thoughts
◦ reduced bias if m(x, β) is close to m(x)
Econometrics Slide 115
Simulated example
Introduction
Nadaraya-Watson and local linear regression
Estimation
(m(x) = x + 5 sin(2x), n = 1000, h = 0.8)
Binary choice
Density estimation
Regression est. Local constant (red dashed) and linear (blue solid) regression
Cond. moments
Nonpar. regression
10
Various estimators
Local linear reg.
Local polynomial
5
Example
Assumptions
Nadaraya-Watson
⊲ Local linear reg.
0
Y
Bandwidth choice
Cross validation
Asymptotics
Confidence intervals
-5
Testing
Examples
Multivariate reg.
-10
Semiparametrics
Discrete choice
-5 0 5
Censored data X
Final thoughts
Econometrics Slide 116
Bandwidth choice
Introduction
Optimal bandwidth selection
Estimation
Binary choice
• minimize mean integrated squared error
Density estimation
ˆ
Regression est. a
Cond. moments M ISE(m̂) = E [m̂(x) − m(x)]2 dx = c1 /(nh) + c2 h4
Nonpar. regression
Various estimators
Local linear reg.
Local polynomial ⇒ hopt = (c1 /4c2 )n−1/5
Example
Assumptions
Nadaraya-Watson
• plug-in estimator – not used, complicated
Local linear reg.
⊲ Bandwidth choice • alternative: minimize mean average squared error
Cross validation " n
#
Asymptotics
1 X
Confidence intervals
M ASE(m̂) = E {m̂(xi ) − m(xi )}2
Testing n
Examples i=1
Multivariate reg.
◦ advantage: if yi − m(xi ) and m̂(xi ) are uncorrelated,
Semiparametrics Pn Pn
Discrete choice i=1 {yi
− m̂(xi )}2 = i=1 {m(xi ) + εi − m̂(xi )}2 =
Censored data σ 2 + M ASE(m̂)
Final thoughts
Econometrics Slide 117
Cross validation
Introduction
Mean average squared error
Estimation
Binary choice
• yi − m(xi ) and m̂(xi ) are correlated
Density estimation • solution: omit the ith observation from the sample to estimate
Regression est. m(xi ) by m̂h,−i (xi ), which is uncorrelated with yi − m(xi )
Cond. moments
Nonpar. regression
Various estimators
Local linear reg. Leave-one-out cross validation
Local polynomial
Example n
X
Assumptions
Nadaraya-Watson hCV = arg minh {yi − m̂h,−i (xi )}2
Local linear reg.
Bandwidth choice
i=1
⊲ Cross validation
Asymptotics
Confidence intervals • m̂h,−i (xi ) = leave-one-out estimate
Testing
Examples
based on observations 1, . . . , i − 1, i + 1, . . . , n
Multivariate reg.
• hCV → hopt very slowly (∼ n−1/10 )
Semiparametrics
Discrete choice
• often used
Censored data
Final thoughts
Econometrics Slide 118
Example – Engel curve
Introduction
Food expenditures vs. net income in the U.K., 1973:
Estimation
bandwidth selection by cross validation
Binary choice
Examples Kernel K:
qua
0.5
Multivariate reg.
Semiparametrics Binwidth d:
0.01
Discrete choice
0
Censored data
0.5 1 1.5 2 2.5 3
Final thoughts X
Econometrics Slide 119
Asymptotics – consistency
Introduction
Provided that kernel K and density f satisfy additionally
Estimation
Final thoughts
Econometrics Slide 120
Asymptotics – asymptotic distribution
Introduction
The Nadaraya-Watson estimator is (hn → 0 and nhn → ∞)
Estimation
Censored data
• local linear estimator – analogous
Final thoughts
Econometrics Slide 121
Confidence intervals
Introduction
Asymptotic normality ⇒ pointwise confidence intervals
Estimation
Binary choice
• asymptotic confidence interval
Density estimation
α
" #−1/2
Φ−1 1− σ̂ 2 (x)
ˆ
Regression est.
Cond. moments m̂(x) ± √ 2
K 2 (x)dx
Nonpar. regression nh fˆ(x)
Various estimators
Local linear reg.
Discrete choice
Censored data
Final thoughts
Econometrics Slide 122
Testing
Introduction
Testing H0 : m = g vs. H1 : m 6= g for a known g(x, θ)
Estimation
Censored data
• many other possibilities
Final thoughts
Econometrics Slide 123
Example: CPS 1985
Introduction
Income-experience profile (USA, CPS 1985)
Estimation
(education fixed at 12 years)
Binary choice
Density estimation
LS (solid) and NW (dashed) fits at educ = 12
Regression est.
Cond. moments
3
Nonpar. regression
Various estimators
Local linear reg. 2.5
Local polynomial
Example
2
Assumptions
Log Income
Nadaraya-Watson
Local linear reg.
1.5
Bandwidth choice
Cross validation
Asymptotics
1
Confidence intervals
Testing
0.5
⊲ Examples
Multivariate reg.
Semiparametrics
0
Discrete choice 0 10 20 30 40
Censored data Experience
Final thoughts
Econometrics Slide 124
Example: coronary heart disease
Introduction
probit chd age
Estimation
predict prob, pr
Binary choice
lpoly chd age, ci degree(1) addplot((line prob age, sort))
Density estimation
Regression est.
1.5
Nonpar. regression
Various estimators
Local linear reg.
Local polynomial
1
Example
Assumptions
.5
Nadaraya-Watson
Local linear reg.
Bandwidth choice
0
Cross validation
Asymptotics
Confidence intervals
−.5
Testing
⊲ Examples 0 20 40 60
Multivariate reg. Age (in years)
Final thoughts
Econometrics Slide 125
Multivariate regression
Introduction
Regression function E(y|x) = E(y|x1 , . . . , xd ).
Estimation
Final thoughts
hopt ∼ n−1/(4+d) ⇒ AM SE ∼ n−4/(4+d) , nh ∼ n−2/(4+d)
Econometrics Slide 126
Example: CPS 1985
Introduction
Income as a function of education and experience
Estimation
and Mincer’s equation (USA, CPS 1985)
Binary choice
(education = 2 to 18 years, experience = 0 to 55 years)
Density estimation
Regression est.
Cond. moments
Nonpar. regression Wage = m(Education, Experience) Wage = a + b*Educ + c*Exp + d*Exp^2
Various estimators
Local linear reg.
Local polynomial
Example
Assumptions
Nadaraya-Watson
Local linear reg.
Bandwidth choice
Cross validation
Asymptotics
43.9
Confidence intervals 37.8
31.6
Testing 25.5
43.9 19.4
Examples 37.8 13.2
31.6 7.1
⊲ Multivariate reg. 25.5
19.4 6.8 10.5 14.2
13.2
7.1
Semiparametrics
6.8 10.5
Discrete choice 14.2
Censored data
Final thoughts
Econometrics Slide 127
Introduction
Estimation
Binary choice
Density estimation
Regression est.
⊲ Semiparametrics
Semiparametric LS
Klein and Spady
Example
Semiparametric regression
Average derivative
PLM estimation
Heteroscedasticity
Application
Discrete choice
Censored data
Final thoughts
Introduction
Semiparametric least squares: Ichimura (1993)
Estimation
Nonlinear least squares for E(Yi |Xi ) = g(Xi⊤ β): g is known
Binary choice
n
X
Density estimation
Regression est.
min {Yi − g(Xi⊤ β)}2
β∈B
Semiparametrics
i=1
⊲ Semiparametric LS
Klein and Spady Semiparametric least squares: g is unknown
Example
Average derivative
PLM
• estimate g(Xi⊤ β) = E(Yi |Xi⊤ β) = E[g(Xi⊤ β)|Xi⊤ β) using a
Heteroscedasticity leave-one-out estimator ĝn (Xi⊤ β) = Ê−i,n (Yi |Xi⊤ β) =
Application
,
Discrete choice
X (Xj − Xi ⊤
) β X (Xj − Xi ⊤
) β
Yi K K
Censored data hn hn
j6=i j6=i
Final thoughts
Estimation
Final thoughts
• Σ = E[E(ε2i |Xi ) · {g ′ (Xi⊤ β)}2 ×
{Xi − E(Xi |Xi⊤ β)}{Xi − E(Xi |Xi⊤ β)}⊤ ]
• V = E[{g ′ (Xi⊤ β)}2 {Xi − E(Xi |Xi⊤ β)}{Xi − E(Xi |Xi⊤ β)}⊤ ]
• semiparametrically efficient
• under heteroscedasticity, weighting can be employed
Econometrics Slide 130
Klein and Spady
Introduction
Klein and Spady (1993): estimate F and maximize likelihood based
Estimation
on the estimated distribution function F̂n
Binary choice
Censored data
n
X
Final thoughts ξi [Yi ln F̂−i,n (Xi⊤ β) + (1 − Yi ) ln{1 − F̂−i,n (Xi⊤ β)}]
i=1
Introduction
Klein and Spady (1993)’s estimator under assumptions
Estimation
Binary choice
• data are iid, β ∈ B compact
Density estimation • P (β) = P (Yi = 1|Xi⊤ β) = P (Yi = 1|Xi ) ∈ (a, b),
Regression est. where 0 < a and b < 1
Semiparametrics
Semiparametric LS
• P (Yi = 1|Xi⊤ β = t) continuously differentiable in t
⊲ Klein and Spady
Example
• n−1/6 < hn < n−1/8 and a higher-order kernel is used
Average derivative
PLM is
Heteroscedasticity
Application
• consistent and asymptotically normal
Discrete choice
−1 !
Censored data
√ d ∂P (β) ∂P (β) 1
Final thoughts n(β̂n −β) → N 0, E
∂β ∂β ⊤ P (β)[1 − P (β)]
• semiparametrically efficient
• (parametrically) efficient if E(Xi |Xi⊤ β) = c0 + c1 (Xi⊤ β)
Introduction
Married women labor force participation (Mroz, 1987):
Estimation
Binary choice
• binary decision = choice to work (1) or to stay at home (0)
Density estimation • explanatory variables
Regression est.
Semiparametrics
◦ non-wife household income
Semiparametric LS
Klein and Spady
◦ age
⊲ Example
Average derivative
◦ education
PLM
Heteroscedasticity
◦ labor market experience and its square
Application
◦ number of children (below and above 6)
Discrete choice
Censored data
Final thoughts
Introduction
Married women labor force participation (Mroz, 1987) – estimates:
Estimation
Binary choice
Density estimation
Stata: regress probit sml
Regression est.
Linear Probit KS
Semiparametrics
Semiparametric LS --------------------------------------------------
Klein and Spady
⊲ Example inlf | Coef. SE | Coef. SE | Coef. SE
Average derivative
PLM
---------+-------------+-------------+------------
Heteroscedasticity nwifeinc | -.011 .004 | -.014 .006 | -.015 .001
Application
Discrete choice
educ | .141 .027 | .151 .029 | .129 .011
Censored data
exper | .382 .019 | .142 .022 | .243 ---
Final thoughts
expersq | -.008 .000 | -.002 .001 | -.005 .000
age | -.061 .008 | -.060 .010 | -.058 .002
kidslt6 | -1 .126 | -1 .137 | -1 .025
kidsge6 | .050 .049 | .041 .050 | .052 .009
_cons | 2.237 .588 | .311 .585 | --- ---
---------+-------------+-------------+------------
Econometrics Slide 134
Example: labor force participation
Introduction
probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6
Estimation
predict pindex, xb
Binary choice
predict prob, p
Density estimation
lpoly inlf pindex, ci addplot((line prob pindex, sort))
Regression est.
Semiparametrics
Semiparametric LS Local polynomial smooth
Klein and Spady
⊲ Example 1
Y=1 if in labor force, 1975
Average derivative
PLM
.5
Heteroscedasticity
Application
Discrete choice
0
Censored data
Final thoughts
−.5
−3 −2 −1 0 1 2
Linear prediction
Introduction
probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6
Estimation
predict pindex, xb
Binary choice
predict prob, p
Density estimation
lpoly inlf pindex, ci degree(1) addplot((line prob pindex, sort))
Regression est.
Semiparametrics
Semiparametric LS Local polynomial smooth
1.5
Klein and Spady
⊲ Example
Y=1 if in labor force, 1975
Average derivative
1
PLM
Heteroscedasticity
Application
.5
Discrete choice
Censored data
0
Final thoughts
−.5
−3 −2 −1 0 1 2
Linear prediction
Estimation
(Powell, Stock, and Stoker, 1989; Härdle and Stoker, 1989)
Binary choice
Semiparametrics
′ ∂g(x ⊤ β)
′ ⊤
′
Semiparametric LS m (x) = = g (x β)β ⇒ E m (Xi ) = γβ
Klein and Spady ∂x
Example
⊲ Average derivative
PLM • integration by parts (f denotes density of Xi )
Heteroscedasticity
Application
′
f (Xi )
ˆ ˆ
Discrete choice Em′ (Xi ) = m′ (x)f (x)dx = − m(x)f ′ (x)dx = −E Yi
Censored data
f (Xi )
Final thoughts
• estimate f and f ′ by kernel density estimator and set (an ↓ 0)
Yi fˆn (Xi )
n
X ′
c =
γβ − · I[fˆn (Xi ) > an ]
n
i=1 nfˆn (Xi )
Estimation
Introduction
Partially linear models (Robinson, 1988, ECO 56, 931–954)
Estimation
Binary choice
• linear regression with one variable Zi entering nonparametrically
Density estimation
Regression est.
Yi = Xi⊤ β +g(Zi )+εi , E(εi |Xi , Zi ) = 0, g:R→R
Semiparametrics
Semiparametric LS Extensions (cf. Tobit and sample selection models)
Klein and Spady
Example
Average derivative
• partially linear single-index model (Xia, Tong, and Li, 1999)
⊲ PLM
Heteroscedasticity
Application
Yi = Xi⊤ β + g(Zi⊤ γ) + εi
Discrete choice
• generalized partially linear models (Carroll et al., 1997)
Censored data
Final thoughts
E(Yi |Xi ) = F (Xi⊤ β + g(Zi ))
P (Yi = 1|Xi )
ln = Xi⊤ β + g(Zi )
P (Yi = 0|Xi )
Econometrics Slide 139
Other applications – partially linear models
Estimation
Binary choice • E(Yi |Zi ) = E(Xi |Zi )⊤ β + g(Zi ) + E(εi |Zi ) and
Density estimation
Regression est. Yi − E(Yi |Zi ) = {Xi − E(Xi |Zi )}⊤ β + {εi − E(εi |Zi )}
Semiparametrics
Semiparametric LS
Klein and Spady
Estimation (involves p + 1 nonparametric regressions)
Example
Average derivative • estimate µyi = µy (Zi ) = E(Yi |Zi ), µxi = µx (Zi ) = E(Xi |Zi )
⊲ PLM
Heteroscedasticity • estimate β by the least squares estimator β̂n
Application
Discrete choice
" n
#−1 " n #
X X
Censored data
(Xi − µ̂xi )(Xi − µ̂xi )⊤ (Xi − µ̂xi )(Yi − µ̂yi )⊤
Final thoughts
i=1 i=1
Estimation Pn −2 ⊤
Binary choice
Generalized LS (GLS) solves i=1 σ (Xi )Xi (Yi − Xi β) = 0
Density estimation
n
!−1 n
!
X Xi Xi⊤ X Xi Yi
β̂nGLS
Regression est.
=
Semiparametrics
i=1
σ 2 (Xi )
i=1
σi2 (Xi )
Semiparametric LS
Klein and Spady
Example
Average derivative
Heteroscedasticity of
PLM
⊲ Heteroscedasticity • known form: σ 2 (Xi ) = exp(Xi⊤ γ) and estimate γ̂n
Application
• unknown form: nonparametric estimate σ̂n2 (Xi ) of σ 2 (Xi )
Discrete choice
Censored data
◦ Robinson (1987) ECO 55(4), 875–891: compute
Final thoughts
ei = yi − x⊤ OLS and nonparametrically estimate
i β̂
σ 2 (x) = E(ε2i |Xi ) using residuals e2i
◦ alternative – fully nonparametric estimation: compute
\
ẽi = Yi − E(Y i |Xi ) and nonparametrically estimate
σ 2 (Xi ) = E(ε2i |Xi ) using residuals ẽ2i
Econometrics Slide 141
Application
Introduction
Lehrer and Kordas (2013) Matching using semiparametric propensity
Estimation
scores. Empirical Economics 44, 13–45.
Binary choice
Discrete choice
Censored data
Final thoughts
Estimation
Binary choice
Density estimation
Regression est.
Semiparametrics
⊲ Discrete choice
Introduction
Ordered models Discrete choice models
Example
Specification tests
Semiparametrics
Application
Multinomial models
Latent model
Multinomial logit
Latent model
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics
Censored data
Final thoughts
Introduction
Data with a discrete response with more than two values
Estimation
Binary choice
• multiple discrete responses yi = 0, 1, . . . , J
Density estimation
◦ ordered response (values not completely arbitrary)
Regression est.
(e.g., credit rating, preference, health plan choice, ...)
Semiparametrics
Censored data
Final thoughts
Introduction
Discrete response yi = 0, . . . , J , where responses are ordered
Estimation
(ratings, preferences for food, no/part-time/full-time job)
Binary choice
Introduction
Ordered response models:
Estimation
Binary choice
• probabilities (P (yi = j|xi ) = P (αj−1 < x⊤
i β + εi ≤ αj |xi ))
Density estimation
Regression est.
P (yi = 0|xi ) = F (α1 − x⊤
i β) − 0
Semiparametrics P (yi = 1|xi ) = F (α2 − x⊤
i β) − F (α1 − x ⊤
i β)
Discrete choice .. .. ..
Introduction
. . .
⊲ Ordered models
Example
Specification tests
P (yi = J|xi ) = 1 − F (αJ − x⊤
i β)
Semiparametrics
Application
Multinomial models • log-likelihood contributions (for MLE, FOC, variance, ...)
Latent model
Multinomial logit
Latent model l(wi , β) = I(yi = 0) · ln F (α1 − x⊤
i β)
Conditional logit
Multinomial probit + I(yi = 1) · ln[F (α2 − x⊤
i β) − F (α1 − x ⊤
i β)]
Hierarchy
Semiparametrics + ...
+ I(yi = J) · ln[1 − F (αJ − x⊤
Censored data
Final thoughts
i β)]
Introduction
Probit fit of yi = I(0.5 + xi + εi > −0.5) + I(0.5 + xi + εi > 1)
Estimation
for εi ∼ N (0, 1) and n = 1000
Binary choice
Density estimation
3
Regression est.
Semiparametrics
Discrete choice
Introduction
2
⊲ Ordered models
Example
Specification tests
Semiparametrics
1
Application
Multinomial models
Latent model
Multinomial logit
Latent model
0
Conditional logit
Multinomial probit −6 −4 −2 0 2 4
Hierarchy Linear prediction (cutpoints excluded)
Semiparametrics
y Pr(y==0)
Censored data Pr(y==1) Pr(y==2)
Final thoughts
Introduction
Ordered response models:
Estimation
Binary choice
• probabilities (P (yi = j|xi ) = P (αj−1 < x⊤
i β + εi ≤ αj |xi ))
Density estimation
Regression est.
P (yi = 0|xi ) = F (α1 − x⊤
i β)
Semiparametrics P (yi = 1|xi ) = F (α2 − x⊤
i β) − F (α1 − x ⊤
i β)
Discrete choice .. .. ..
Introduction
. . .
⊲ Ordered models
Example
Specification tests
P (yi = J|xi ) = 1 − F (αJ − x⊤
i β)
Semiparametrics
Application
Multinomial models • marginal effects (see signs of the effect of the middle terms!)
Latent model
Multinomial logit
Latent model ∂P (yi = 0|xi )/∂xik = −βk f (α1 − x⊤
i β)
Conditional logit
Multinomial probit ∂P (yi = 1|xi )/∂xik = βk [f (α1 − x⊤
i β) − f (α2 − x ⊤
i β)]
Hierarchy
Semiparametrics .. .. ..
. . .
Censored data
Introduction
Pension-plan decision of adults
Estimation
(mostly bonds = 0, mixed = 1, mostly stocks = 2)
Binary choice
Discrete choice
• profit-sharing plan
Introduction
Ordered models
• age
⊲ Example
Specification tests
• education
Semiparametrics
Application
• gender
Multinomial models
Latent model
• race
Multinomial logit
Latent model
• marital status
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics
Censored data
Final thoughts
Introduction
Introduction
Semiparametrics ---------------------------------------------------
Discrete choice
Introduction
| Delta-method
Ordered models | dy/dx Std. Err. z P>|z|
⊲ Example
Specification tests ----------+----------------------------------------
Semiparametrics
Application 0 prftshr | -.1766166 .070157 -2.52 0.012
Multinomial models
Latent model
age | .0154528 .0070159 2.20 0.028
Multinomial logit ---------------------------------------------------
Latent model
Conditional logit 1 prftshr | .0097728 .0133563 0.73 0.464
Multinomial probit
Hierarchy
age | -.0008551 .0011694 -0.73 0.465
Semiparametrics
---------------------------------------------------
Censored data
2 prftshr | .1668438 .0662334 2.52 0.012
Final thoughts
age | -.0145977 .0066674 -2.19 0.029
Econometrics
---------------------------------------------------
Slide 151
Example: asset allocation
Introduction
oprobit pctstck choice prftshr female married age educ black
Estimation
predict ind, xb; predict p1, outcome(#1)
Binary choice
gen y1 = (y == 0)
Density estimation
lpoly y1 ind, ci degree(1) addplot((line p1 ind, sort))
Regression est.
Semiparametrics
Local polynomial smooth
Discrete choice
Introduction
1
Ordered models
⊲ Example
Specification tests
I(y = 0)
Semiparametrics
.5
Application
Multinomial models
Latent model
Multinomial logit
0
Latent model
Conditional logit
−3 −2.5 −2 −1.5 −1
Multinomial probit Linear prediction (cutpoints excluded)
Hierarchy
95% CI I(y=0)
Semiparametrics
lpoly smooth Pr(y=0)
Censored data kernel = epanechnikov, degree = 1, bandwidth = .37, pwidth = .56
Final thoughts
Introduction
oprobit pctstck choice prftshr female married age educ black
Estimation
predict ind, xb; predict p2, outcome(#2)
Binary choice
gen y2 = (y == 1)
Density estimation
lpoly y2 ind, ci degree(1) addplot((line p2 ind, sort))
Regression est.
Semiparametrics
Local polynomial smooth
Discrete choice
1.5
Introduction
Ordered models
⊲ Example
1
Specification tests
I(y = 1)
Semiparametrics
.5
Application
Multinomial models
Latent model
0
Multinomial logit
Latent model
−.5
Conditional logit
−3 −2.5 −2 −1.5 −1
Multinomial probit Linear prediction (cutpoints excluded)
Hierarchy
95% CI I(y=1)
Semiparametrics
lpoly smooth Pr(y=1)
Censored data kernel = epanechnikov, degree = 1, bandwidth = .19, pwidth = .29
Final thoughts
Introduction
oprobit pctstck choice prftshr female married age educ black
Estimation
predict ind, xb; predict p3, outcome(#3)
Binary choice
gen y3 = (y == 2)
Density estimation
lpoly y3 ind, ci degree(1) addplot((line p3 ind, sort))
Regression est.
Semiparametrics
Local polynomial smooth
Discrete choice
1
Introduction
Ordered models
⊲ Example
.5
Specification tests
I(y = 2)
Semiparametrics
0
Application
Multinomial models
−.5
Latent model
Multinomial logit
Latent model
−1
Conditional logit
−3 −2.5 −2 −1.5 −1
Multinomial probit Linear prediction (cutpoints excluded)
Hierarchy
95% CI I(y=2)
Semiparametrics
lpoly smooth Pr(y=2)
Censored data kernel = epanechnikov, degree = 1, bandwidth = .19, pwidth = .29
Final thoughts
Introduction
Data plot with cut-off points
Estimation
Binary choice
2
Density estimation
Regression est.
1.5
Semiparametrics
Discrete choice
Introduction
Ordered models
⊲ Example
1
y
Specification tests
Semiparametrics
Application
Multinomial models
.5
Latent model
Multinomial logit
Latent model
Conditional logit
0
Multinomial probit
Hierarchy −3 −2.5 −2 −1.5 −1
Semiparametrics Linear prediction (cutpoints excluded)
Censored data
Final thoughts
Introduction
Possible problems with the model specification
Estimation
Binary choice
• parallel regression assumption
Density estimation
◦ ordered choice model with constant slopes (probit)
Regression est.
Discrete choice
i β)
Introduction
Ordered models
Example
◦ ordered choice model with varying slopes (probit)
⊲ Specification tests
Estimation
i β) applies,
J
X
jP (yi = j|x⊤ ⊤
Binary choice
E(yi |xi ) = i β) = g(x i β),
Density estimation
j=0
Regression est.
P (yi = 1|x⊤
Semiparametrics
Censored data
i β = t) = F (t) = 1 − F (−t)
Introduction
Breshanan and Reis (1991) Entry and competition in concentrated
Estimation
markets. The Journal of Political Economy 99, 977–1009.
Binary choice
Density estimation • study the number of firms in a market given its size and
Regression est. competition
Semiparametrics
• analyze 202 geographically isolated markets (dentists, plumbers,
Discrete choice
Introduction
electricians etc. in county seat cities)
Ordered models
Example
Specification tests
Semiparametrics
⊲ Application
Multinomial models
Latent model
Multinomial logit
Latent model
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics
Censored data
Final thoughts
Introduction
Data with a discrete response with more than two values
Estimation
Binary choice
• multiple discrete responses yi = 0, 1, . . . , J
Density estimation
◦ unordered (nominal) response
Regression est.
(e.g., mode of transportation, choice of industry for
Semiparametrics
Discrete choice
investment, brand choice, ...)
Introduction
Ordered models
◦ ordered response (values not completely arbitrary)
Example (e.g., credit rating, preference, health plan choice, ...)
Specification tests
Semiparametrics
Application • explanatory variables xi
⊲ Multinomial models
Latent model • motivated by latent (utility-maximization) models
Multinomial logit
Latent model
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics
Censored data
Final thoughts
Introduction
Deriving the multinomial logit model from a latent utility maximization
Estimation
(McFadden, 1974), where each choice j ≥ 1 has own coefficient βj
Binary choice
Density estimation ∗
yij = x⊤
ij βj + εij j = 0, . . . , J
Regression est.
Semiparametrics
∗ implies
• utility maximization yi = arg maxj=0,...,J yij
Discrete choice
Introduction
Ordered models
Example P (yi = j|xi0 , . . . xiJ ) = P (yij > yik for all k 6= j|xi0 , . . . xiJ )
Specification tests
Semiparametrics
Application • assuming type I extreme value (Gumbel) distribution
Multinomial models
⊲ Latent model εij ∼ F (t) = exp(− exp(t)), it follows
Multinomial logit
Latent model
Conditional logit exp(x⊤
ij βj )
Multinomial probit P (yi = j|xi0 , . . . xiJ ) = PJ
⊤
l=0 exp(xil βl )
Hierarchy
Semiparametrics
Censored data
• coefficients βj or values xij have to vary across choices
Final thoughts P P
(exp(. . .+x⊤
i β)/ exp(. . .+x⊤
i β) = exp(. . .)/ exp(. . .))
Econometrics Slide 160
Multinomial logit model
Introduction
Multinomial choice model
Estimation
exp(x⊤
ij βj )
Binary choice
P (yi = j|xi0 , . . . xiJ ) = PJ
Density estimation exp(x ⊤β )
l=0 il l
Regression est.
Introduction
Multinomial logit model:
Estimation
exp(x⊤ i βj )
Binary choice
P (yi = j|xi ) = PJ
Density estimation 1 + l=1 exp(x⊤ i βl )
Regression est.
1
Semiparametrics P (yi = 0|xi ) = PJ
Discrete choice
1 + l=1 exp(x⊤ i βl )
Introduction Pn
Ordered models • contributions to log-likelihood function ln Ln (β) = i=1 li (β):
Example
Specification tests
J
X
Semiparametrics
Application li (β) = I(yi = j) log P (yi = j|xi )
Multinomial models
Latent model j=0
⊲ Multinomial logit
Latent model • note: reduction to the binary logit (P (j|j, k) = P (j)/P (j, k))
Conditional logit
Multinomial probit
exp(x⊤ i βj )
Hierarchy
P (yi = j|yi ∈ {j, k}, xi ) = ⊤ ⊤
=
Semiparametrics
exp(xi βj ) + exp(xi βk )
Censored data {1 + exp[−x⊤
i (βj − βk )]}
−1
= Λ{x⊤
i (βj − βk )}
Final thoughts
Introduction
Multinomial logit (here j ∈ {1, . . . , J}):
Estimation
Regression est.
exp(x⊤i βj )
pj (xi ) = P (yi = j|xi ) = PJ
Semiparametrics 1 + l=1 exp(x⊤ i βl )
Discrete choice
Introduction
Ordered models • partial effects
Example
Specification tests
( PJ )
⊤
Semiparametrics ∂P (yi = j|xi ) l=1 βlk exp(xi βl )
Application = P (yi = j|xi ) βjk − PJ
Multinomial models ∂xik 1 + l=1 exp(x⊤ i βl )
Latent model
⊲ Multinomial logit
Latent model
Conditional logit
• (simpler) interpretation of partial effects via
Multinomial probit
Hierarchy P (yi = j|xi ) ∂ pj (xi )
Semiparametrics = exp(x⊤
i β j ) ⇒ = exp(x ⊤
i βj )βjk
Censored data
P (yi = 0|xi ) ∂xik p0 (xi )
Final thoughts
Introduction
Employment and schooling decisions of young men
Estimation
(school = 1, home = 2, work = 3)
Binary choice
Discrete choice
• work experience
Introduction
Ordered models
• race
Example
Specification tests
Semiparametrics
Application
Multinomial models
Latent model
⊲ Multinomial logit
Latent model
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics
Censored data
Final thoughts
Introduction
Estimation
Multinomial logistic regression
Binary choice
------------------------------------------------------
Density estimation
status | Coef. Std. Err. z P>|z|
-------------+----------------------------------------
Regression est.
1 | (base outcome)
Semiparametrics
-------------+----------------------------------------
Discrete choice
Introduction 2 educ | -.6736313 .0698999 -9.64 0.000
Ordered models exper | -.1062149 .173282 -0.61 0.540
Example
Specification tests expersq | -.0125152 .0252291 -0.50 0.620
Semiparametrics
Application
black | .8130166 .3027231 2.69 0.007
Multinomial models _cons | 10.27787 1.133336 9.07 0.000
Latent model
⊲ Multinomial logit
-------------+----------------------------------------
Latent model 3 educ | -.3146573 .0651096 -4.83 0.000
Conditional logit
Multinomial probit
exper | .8487367 .1569856 5.41 0.000
Hierarchy expersq | -.0773003 .0229217 -3.37 0.001
Semiparametrics
black | .3113612 .2815339 1.11 0.269
Censored data
_cons | 5.543798 1.086409 5.10 0.000
Final thoughts
------------------------------------------------------
Econometrics Slide 165
Example: school and employment decision
Introduction
Introduction
Deriving the multinomial logit model from a latent utility maximization
Estimation
(McFadden, 1974) , where each choice j ≥ 1 has own coefficient βj
Binary choice
Density estimation ∗
yij = x⊤
ij βj + εij j = 0, . . . , J
Regression est.
Semiparametrics
∗ implies
• utility maximization yi = arg maxj=0,...,J yij
Discrete choice
Introduction
Ordered models
Example P (yi = j|xi0 , . . . xiJ ) = P (yij > yik for all k 6= j|xi0 , . . . xiJ )
Specification tests
Semiparametrics
Application • assuming type I extreme value (Gumbel) distribution
Multinomial models
Latent model εij ∼ F (t) = exp(− exp(t)), it follows
Multinomial logit
⊲ Latent model
Conditional logit exp(x⊤
ij βj )
Multinomial probit P (yi = j|xi0 , . . . xiJ ) = PJ
⊤
l=0 exp(xil βl )
Hierarchy
Semiparametrics
Censored data
• coefficients βj or values xij have to vary across choices
Final thoughts P P
(exp(. . .+x⊤
i β)/ exp(. . .+x⊤
i β) = exp(. . .)/ exp(. . .))
Econometrics Slide 167
Conditional logit model
Introduction
Multinomial choice model
Estimation
exp(x⊤
ij βj )
Binary choice
P (yi = j|xi0 , . . . xiJ ) = PJ
Density estimation exp(x ⊤β )
l=0 il l
Regression est.
Introduction
Consider a multinomial response model using characteristics xi of
Estimation
individual choices (varying with choices j ) and one common β .
Binary choice
Introduction
Multinomial logit model (e.g., choice of occupation):
Estimation
Binary choice
• individual characteristics used
Density estimation • characteristics of alternative choice unimportant and omitted
Regression est.
Conditional logit (e.g., choice of transport):
Semiparametrics
Final thoughts
pj (xi )/pl (xi ) = exp(x⊤
ij β)/ exp(x ⊤
il β) = exp[(x ij − x ⊤
il β]
)
Introduction
Multiple-choice problem: J choices with utilities
Estimation
∗
Binary choice
yij = x⊤
ij β + εij j = 0, . . . , J
Density estimation
Regression est.
Introduction
Employment and schooling decisions of young men
Estimation
(school = 1, home = 2, work = 3)
Binary choice
Discrete choice
• work experience
Introduction
Ordered models
• race
Example
Specification tests
Semiparametrics
Application
Multinomial models
Latent model
Multinomial logit
Latent model
Conditional logit
⊲ Multinomial probit
Hierarchy
Semiparametrics
Censored data
Final thoughts
Introduction
Estimation
Multinomial probit regression
Binary choice
------------------------------------------------------
Density estimation
status | Coef. Std. Err. z P>|z|
-------------+----------------------------------------
Regression est.
1 | (base outcome)
Semiparametrics
-------------+----------------------------------------
Discrete choice
Introduction 2 educ | -.4410793 .0413589 -10.66 0.000
Ordered models exper | -.1137917 .1114018 -1.02 0.307
Example
Specification tests expersq | -.0043746 .0155293 -0.28 0.778
Semiparametrics
Application
black | .6047029 .1908654 3.17 0.002
Multinomial models _cons | 6.706148 .6554217 10.23 0.000
Latent model
Multinomial logit
-------------+----------------------------------------
Latent model 3 educ | -.162221 .0385226 -4.21 0.000
Conditional logit
⊲ Multinomial probit
exper | .6721982 .1027514 6.54 0.000
Hierarchy expersq | -.0592846 .0142553 -4.16 0.000
Semiparametrics
black | .2244359 .1795638 1.25 0.211
Censored data
_cons | 2.985019 .6285942 4.75 0.000
Final thoughts
------------------------------------------------------
Econometrics Slide 173
Example: school and employment decision
Introduction
Introduction
Nested logit model (yi = 0, 1, . . . , J ):
Estimation
Introduction
Estimation of the nested logit model:
Estimation
Binary choice
• normalization α1 = 1 required
Density estimation • other restrictions often imposed
Regression est. (e.g., α1 = . . . = αS or ρ1 = . . . = ρS )
Semiparametrics
• 1 − ρs represents correlation of unobservables across groups
Discrete choice
Introduction • limited-information likelihood
Ordered models
Example
Specification tests ◦ estimate λs = ρ−1
s β by conditional logit for each group of
Semiparametrics
Application
responses Gs , s = 1, . . . , S
Multinomial models
Latent model
◦ maximize multinomial choice of the group Gs
Multinomial logit
Latent model
Conditional logit
• full-information likelihood
Multinomial probit
⊲ Hierarchy ◦ maximize joint likelihood based on
Semiparametrics
Censored data
P (yi = j|yi ∈ Gs , xi ) · P (yi ∈ Gs |xi )
Final thoughts
• conditional logit: α1 = . . . = αS = ρ1 = . . . = ρS = 1 (test)
Econometrics Slide 176
Semiparametric alternatives
Introduction
Straightforward generalizations of the methods for binary responses;
Estimation
for example,
Binary choice
Density estimation • Maximum score estimator for multinomial choice (Fox, 2007)
Regression est.
Semiparametrics
◦ choice specific characteristics xij
Discrete choice ◦ consider choices yi = k and yi = l
Introduction
n
1X
Ordered models
Example
Specification tests β̂nM SE = arg maxβ I(yi = k)I(x⊤
ik β > x ⊤
il β)
Semiparametrics n
Application
i=1
Multinomial models
Latent model
+ I(yi = l)I(x⊤
ik β < x⊤
il β)
Multinomial logit
Latent model
Conditional logit
Multinomial probit
◦ many choices
J J n
Hierarchy
1 XXX
⊲ Semiparametrics
β̂nM SE = arg maxβ I(yi = k)I(x⊤
ik β > x ⊤
il β)
Censored data n
k=1 l=1 i=1
Final thoughts
+ I(yi = l)I(x⊤ ⊤
ik β < xil β)
Econometrics Slide 177
Introduction
Estimation
Binary choice
Density estimation
Regression est.
Semiparametrics
Discrete choice
⊲ Censored data
Models for censored and
Introduction
Truncation
truncated data
Tobit model
MLE
Tobit – interpretation
Specification
Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
Application
Final thoughts
Introduction
Censored data
Estimation
Binary choice
• censored responses = some values are not observable; just a
Density estimation lower or upper bound is known
Regression est. (example: duration, income due to bracketing, social
Semiparametrics contributions, taxation rules, toxicity measurements)
Discrete choice
• corner solution responses = response distribution partially
Censored data
⊲ Introduction discrete due to censoring of latent response values
Truncation
(example: alcohol or charitable spending)
Tobit model
MLE
Tobit – interpretation
• estimation similar in both cases, although underlying reasons
Specification and models differ
Alternatives
Two-part Tobit • interpretation differs: for a variable yi with values observed only
Application
Alternatives above a (e.g., a = 0), we can typically be interested in
Sample selection
Two-step estimation
MLE
◦ both models: P (yi > a|xi ) or P (yi ≤ a|xi )
Example
Application
◦ censored model: E(yi |xi )
Final thoughts ◦ corner-solution response: E(yi |xi , yi > a)
Econometrics Slide 179
Linear models?
Estimation
i β + εi
censored at a from below/above
Binary choice
Semiparametrics
• yi = max{yi∗ , a} ⇔ yi − a = max{yi∗ − a, 0}
Discrete choice • yi = min{yi∗ , a} ⇔ −yi = max{−yi∗ , −a}
Censored data • assume yi = max{yi∗ , 0} without loss of generality
⊲ Introduction
Truncation
Tobit model
MLE
Can we use linear model E(yi |xi ) = x⊤
i β?
Tobit – interpretation
Specification • E(yi |xi ) not linear in xi unless range of xi is very limited
Alternatives
Two-part Tobit
(under censoring from below, E(yi |xi ) > E(yi∗ |xi ) = x⊤ i β)
Application
Alternatives
• heteroscedasticity due to var(yi |xi ) (see the next slide)
Sample selection
Two-step estimation
• predictions not always positive
MLE
Example
• P (yi = 0|xi ) not predictable
Application
Final thoughts
Introduction
Linear fit of yi∗ = 0.5 + xi + εi and yi = max{0.5 + xi + εi , 0}
Estimation
for εi ∼ N (0, 1) and n = 1000
Binary choice
Density estimation
6
Regression est.
Semiparametrics
4
Discrete choice
Censored data
2
⊲ Introduction
Truncation
Tobit model
0
MLE
Tobit – interpretation
Specification
−2
Alternatives
Two-part Tobit
−4
Application
Alternatives
−4 −2 0 2 4
Sample selection x
Two-step estimation
MLE Original Censored
Example True line Linear prediction
Application
Final thoughts
Introduction
Censored data
Estimation
yi = max{0, x⊤
i β + εi }
Binary choice
Density estimation
• corner solution: x⊤
i β + εi is the unconstrained optimal choice
Regression est.
Semiparametrics
• censoring: x⊤ ∗
i β + εi represents the latent variable yi
Discrete choice
Censored data
Introduction Truncated data – “corner/censored” values are not observed
⊲ Truncation
Tobit model
MLE yi = x⊤
i β + εi
Tobit – interpretation
Specification
Alternatives
• both yi and xi observed only for yi > 0
Two-part Tobit
Application (example: truncated income data)
Alternatives
Sample selection • other thresholds and truncation from above possible (yi ≶ a)
Two-step estimation
MLE
Example
Application
Final thoughts
Introduction
Linear fit of yi∗ = 0.5 + xi + εi observable only for yi∗ > 0
Estimation
with εi ∼ N (0, 1) and n = 1000
Binary choice
Density estimation
6
Regression est.
Semiparametrics
4
Discrete choice
Censored data
2
Introduction
⊲ Truncation
Tobit model
0
MLE
Tobit – interpretation
−2
Specification
Alternatives
Two-part Tobit
−4
Application
Alternatives −4 −2 0 2 4
Sample selection x
Two-step estimation
MLE Original Truncated
Example True line Linear prediction
Application
Final thoughts
Introduction
Tobit type I model
Estimation
Binary choice
yi = max{0, x⊤
i β + εi }
Density estimation
Regression est.
Final thoughts
Introduction
Linear fit of yi∗ = 0.5 + xi + εi and yi = max{0.5 + xi + εi , 0}
Estimation
for εi ∼ N (0, 1) and n = 1000
Binary choice
Density estimation
6
Regression est.
Semiparametrics
4
Discrete choice
Censored data
2
Introduction
Truncation
⊲ Tobit model
0
MLE
Tobit – interpretation
Specification
−2
Alternatives
Two-part Tobit
−4
Application
Alternatives
−4 −2 0 2 4
Sample selection x
Two-step estimation
MLE Original Censored
Example True line Linear prediction
Application
Final thoughts
Introduction
Error distribution from yi = max{0.5 + xi + εi , 0}
Estimation
for εi ∼ N (0, 1) and xi = 1: εi = (yi |xi = 1) − 0.5 − 1.0
Binary choice
Density estimation
.8
Regression est.
Semiparametrics
Discrete choice
.6
Censored data
Introduction
Density
Truncation
.4
⊲ Tobit model
MLE
Tobit – interpretation
Specification
.2
Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
0
Two-step estimation −5 0 5
MLE
Example Original Censored
Application
Final thoughts
Introduction
Error-term distribution
Estimation
Binary choice
• εi = yi − x⊤ ⊤
i β “observable” only for yi ≥ 0 ⇔ εi ≥ −xi β
Density estimation • εi censored from below at −x⊤
i β:
Regression est.
Semiparametrics
◦ positive probability at t = −x⊤
i β equal to
Discrete choice P (εi ≤ −x⊤ i β|x i ) = Φσ (−x ⊤ β) = Φ(−x⊤ β/σ)
i i
Censored data ◦ continuously distributed at t > −x⊤i β with
Introduction
Truncation density φσ (t) = φ(t/σ)/σ
⊲ Tobit model
MLE
Tobit – interpretation Tobit type I model yi = max{0, x⊤
i β + εi } – distribution properties
Specification
Final thoughts
• uncensored yi cont. distributed: φσ (x⊤
i β) = φ(x ⊤ β/σ)/σ
i
Econometrics Slide 187
Tobit model – estimation
Introduction
Maximum likelihood estimation (yi∗ ∼ Fy )
Estimation
Introduction
Annual labor supply in hours for married women (Mroz, 1987)
Estimation
Reduced form equation using
Binary choice
Censored data
• education
Introduction
Truncation
• labor market experience
Tobit model
⊲ MLE
• age
Tobit – interpretation
Specification
• number of children
Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
Application
Final thoughts
Introduction
Estimation
i β + εi } – basic properties
Binary choice • E(yi |xi ) ≥ max{0, E(x⊤
i β + ε i |x i )} = max{0, x ⊤ β}
i
Density estimation (Jensen’s inequality; see figure on slide 196)
Regression est.
• med(yi |xi ) = max{0, med(x⊤ ⊤
i β + εi |xi )} = max{0, xi β}
Semiparametrics
Discrete choice
Further, all objects of interest are related by
Censored data
Introduction E(yi |xi ) = P (yi = 0|xi ) · 0 + P (yi > 0|xi ) · E(yi |xi , yi > 0)
Truncation
Tobit model
MLE
= P (yi > 0|xi ) · E(yi |xi , yi > 0)
⊲ Tobit – interpretation
Specification
Alternatives
Two-part Tobit
• note: probability P (yi > 0|xi ) can be expressed as in probit
Application
Final thoughts
(the identication assumption of probit is varεi = 1)
Introduction
Probability of no censoring
Estimation
Binary choice
P (yi > 0|xi ) = Φ(x⊤
i β/σ)
Density estimation
Regression est.
Expectation conditional on not being censored
Semiparametrics
Introduction
Marginal effects (estimated at x̄ or as average partial effects)
Estimation
Binary choice
• marginal effects for P (yi > 0|xi )
Density estimation
∂P (yi > 0|xi ) ∂Φ(x⊤
i β/σ) x⊤
i β βk
Regression est.
= =φ
Semiparametrics ∂xik ∂xik σ σ
Discrete choice
Final thoughts
Introduction
Introduction
Semiparametrics ---------------------------------------------------
Discrete choice
| Delta-method
Censored data
Introduction
| dy/dx Std. Err. z P>|z|
Truncation
Tobit model
----------+----------------------------------------
MLE nwifeinc | -.0024212 .0012202 -1.98 0.047
⊲ Tobit – interpretation
Specification educ | .022153 .0058285 3.80 0.000
Alternatives
Two-part Tobit
exper | .0361402 .0043438 8.32 0.000
Application
Alternatives
expersq | -.0005121 .0001444 -3.55 0.000
Sample selection age | -.0149448 .0019298 -7.74 0.000
Two-step estimation
MLE kidslt6 | -.2455841 .0282462 -8.69 0.000
Example
Application
kidsge6 | -.004455 .0106216 -0.42 0.675
Final thoughts ---------------------------------------------------
Econometrics Slide 195
Example: female labor supply
Introduction
Semiparametrics ---------------------------------------------------
Discrete choice
| Delta-method
Censored data
Introduction
| dy/dx Std. Err. z P>|z|
Truncation
Tobit model
----------+----------------------------------------
MLE nwifeinc | -3.968784 2.007582 -1.98 0.048
⊲ Tobit – interpretation
Specification educ | 36.31225 9.703038 3.74 0.000
Alternatives
Two-part Tobit
exper | 59.23938 7.833684 7.56 0.000
Application
Alternatives
expersq | -.8393732 .2423184 -3.46 0.001
Sample selection age | -24.49691 3.362492 -7.29 0.000
Two-step estimation
MLE kidslt6 | -402.5507 50.74877 -7.93 0.000
Example
Application
kidsge6 | -7.302468 17.40427 -0.42 0.675
Final thoughts ---------------------------------------------------
Econometrics Slide 196
Example: female labor supply
Introduction
Semiparametrics ---------------------------------------------------
Discrete choice
| Delta-method
Censored data
Introduction
| dy/dx Std. Err. z P>|z|
Truncation
Tobit model
----------+----------------------------------------
MLE nwifeinc | -5.188622 2.62141 -1.98 0.048
⊲ Tobit – interpretation
Specification educ | 47.47311 12.6214 3.76 0.000
Alternatives
Two-part Tobit
exper | 77.44708 9.997656 7.75 0.000
Application
Alternatives
expersq | -1.097361 .3155947 -3.48 0.001
Sample selection age | -32.02624 4.292112 -7.46 0.000
Two-step estimation
MLE kidslt6 | -526.2779 64.70622 -8.13 0.000
Example
Application
kidsge6 | -9.54694 22.75225 -0.42 0.675
Final thoughts ---------------------------------------------------
Econometrics Slide 197
Specification testing and extensions
Introduction
Extensions
Estimation
Binary choice
• doubly censored/two-limit data
Density estimation
Regression est.
Specification testing
Semiparametrics
• heteroscedasticity and non-normality
Discrete choice
(similar to probit, extend specification or use Hausman test;
Censored data
Introduction
see the censored least absolute deviation)
Truncation
Tobit model
• two-part specification: what if decision P (yi > 0|xi ) is driven by
MLE
Tobit – interpretation
different factors than the average amount E(yi |xi )
⊲ Specification (example: spending on a particular charity, expats’ labour supply)
Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
Application
Final thoughts
Introduction
predict indt, xb; gen rest = hours - indt
Estimation
gen normd = normalden(rest / 1122.022) / 1122.022
Binary choice
kdens rest if indt>0 & rest>0, ci bw(sjpi) ll(0)
Density estimation
addplot((line normd rest if indt>0 & rest>0, sort))
Regression est.
Semiparametrics
.0008
Discrete choice
Censored data
Introduction .0006
Truncation
Density
Tobit model
.0004
MLE
Tobit – interpretation
⊲ Specification
.0002
Alternatives
Two-part Tobit
Application
0
Alternatives
Sample selection 0 1000 2000 3000 4000
rest
Two-step estimation
MLE 95% CI Kernel estimate
Example normd
Application
Final thoughts
Introduction
• Symmetrically trimmed least squares (Powell, 1986)
Estimation
Discrete choice
◦ trimming works under censoring, but inefficiently
Censored data n
X
Introduction
Truncation
β̂ (ST LS) = arg minβ∈B {yi − max(x⊤
i β, yi /2)} 2
Tobit model i=1
MLE
Tobit – interpretation
Specification • Symmetrically censored least squares (Powell, 1986)
⊲ Alternatives
Two-part Tobit
Application ◦ under conditional symmetry of ε|x
Alternatives n
X
β̂ (SCLS) {yi − max(x⊤ 2
Sample selection
Two-step estimation = arg minβ∈B i β, yi /2)}
MLE
Example ni=1 o
2 ⊤ 2
Application
+I(yi > 2x⊤
i β) · (yi /2) − max(0, xi β)
Final thoughts
Introduction
Censored least absolute deviation (CLAD) method (Powell, 1984)
Estimation
Binary choice
• med(yi |xi ) = max{0, med(x⊤ ⊤
i β + εi |xi )} = max{0, xi β}
Density estimation • assume med(εi |xi ) = 0 and minimize
Regression est.
n
X
Semiparametrics
Discrete choice
|yi − max{0, x⊤
i β}|
Censored data
i=1
Introduction
Truncation
√
Tobit model
• n-consistent and asymptotically normal estimator
MLE
Tobit – interpretation
• uses only observations with x⊤i β > 0;
Specification
full-rank assumption for E(xi x⊤
i |x ⊤ β > 0)
i
⊲ Alternatives
Two-part Tobit (just as for STLS / SCLS; )
Application
Alternatives • “only” med(yi |xi ) identified
Sample selection
Two-step estimation (STLS / SCLS identify “only” E(yi |xi ))
MLE
Example • poor performance in small or heavily censored samples
Application
Final thoughts
Introduction
Introduction
Recall the Tobit type I model
Estimation
Binary choice
yi = max{0, x⊤
i β + εi }
Density estimation
Regression est.
Final thoughts
Introduction
Two-part Tobit model (also hurdle model):
Estimation
model the following two decisions separately
Binary choice
Censored data
• assume independence of the two decisions for now:
Introduction yi = si qi , where si = I(yi > 0), and
Truncation
Tobit model
MLE
Tobit – interpretation
Fq (qi |xi , si ) = Fq (qi |xi )
Specification
Alternatives
⊲ Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
Application
Final thoughts
Introduction
Melenberg and van Soest (1996) Modelling of vacation expenditures.
Estimation
Journal of Applied Econometrics 11, 59–76
Binary choice
Final thoughts
Introduction
Truncated normal hurdle model (Cragg, 1971)
Estimation
Binary choice
• decision – model by probit:
Density estimation P (si = 1|xi ) = P (yi > 0|xi ) = Φ(x⊤
i γ)
Regression est. • amount – model using yi = qi ∼ N (0, σ 2 ) truncated at 0
Semiparametrics
Discrete choice
f (yi |xi ) = f (yi |xi , yi > 0) · P (yi > 0|xi )
Censored data φ{(yi − x⊤ i β)/σ}/σ
Introduction f (yi |xi , yi > 0) =
Truncation Φ{x⊤ i β/σ}
Tobit model
MLE
Tobit – interpretation • likelihood contribution for the full-information MLE:
Specification
Alternatives l(yi , xi , β) = I(yi = 0) · ln[1 − Φ(x⊤
i γ)]
Two-part Tobit
⊲ Application + I(yi > 0) · ln[Φ(x⊤
i γ)]
Alternatives
Sample selection + I(yi > 0) · ln[φ{(yi − x⊤
i β)/σ}/σ]
Two-step estimation
MLE
Example
− I(yi > 0) · ln[Φ(x⊤
i β/σ)]
Application
Introduction
Estimation
i β + εi }
Binary choice • single-index models applicable (semiparametric LS)
Density estimation
• conditional median assumption med(εi |xi ) = 0
Regression est.
• least absolute deviation regression (LAD; biased)
Semiparametrics
Discrete choice n
X
Censored data minp |yi − x⊤
i β|
Introduction β∈R
Truncation i=1
Tobit model
MLE
Tobit – interpretation
• censored absolute deviation regression (CLAD)
Specification
Alternatives n
X
|yi − max{0, x⊤
Two-part Tobit
Application minp i β}|
⊲ Alternatives β∈R
i=1
Sample selection
Two-step estimation
MLE skewed in small samples, no two-part model
Example
Application
Final thoughts
Estimation
i β + εi }
Binary choice n
X n
X
Density estimation minp |yi − x⊤
i β| vs. minp |yi − max{0, x⊤
i β}|
Regression est.
β∈R β∈R
i=1 i=1
Semiparametrics
Discrete choice
Censored data
• observation: criteria of QR and CQR equivalent for x⊤
i β >0
Introduction
Truncation
• observation: med(yi |xi ) = x⊤
i β if P (yi > 0|xi ) > 0.5
Tobit model ⇒ estimate nonparametrically (by Nadaraya-Watson estimator)
MLE
Tobit – interpretation p(xi ) = P (yi > 0|xi ) = E[I(yi > 0)|xi )]
Specification
Alternatives • observation: med(yi |xi ) = x⊤
i β if med(y i |x i ) = x ⊤β > 0
i
Two-part Tobit
Application ⇒ estimate nonparametrically q(xi ) = med(yi |xi )“ = a(x)”
⊲ Alternatives
Sample selection
(local median regression)
Two-step estimation
MLE n
X
Example
Application
minp |yi − a(x) − b(x)(xi − x)|K{Hn−1 (xi − x)}
β∈R
Final thoughts i=1
Introduction
Two-step estimation of censored regression model
Estimation
Binary choice
yi = max{0, x⊤
i β + εi }
Density estimation
Regression est.
Final thoughts
Introduction
sml inlf nwifeinc educ expersq age kidslt6 kidsge6, offset(exper)
Estimation
predict indsml, xb
Binary choice
lpoly inlf indsml, generate(indlp plp)
Density estimation
Regression est.
Local polynomial smooth
Semiparametrics
1
Discrete choice
.8
Censored data Y=1 if in lab frce, 1975
Introduction
Truncation
.6
Tobit model
MLE
.4
Tobit – interpretation
Specification
Alternatives
.2
Two-part Tobit
Application
0
⊲ Alternatives
Sample selection −80 −60 −40 −20 0 20
Linear prediction
Two-step estimation kernel = epanechnikov, degree = 0, bandwidth = 5.32
MLE
Example
Application
Final thoughts
Introduction
Introduction
Tobit type II – incidental truncation
Estimation
Binary choice
y1i = x⊤
1i β1 + ε1i
Density estimation
Regression est.
y2i = I(x⊤
2i β2 + ε2i > 0)
Semiparametrics
Introduction
Since in the model
Estimation
Binary choice
y1i = x⊤
1i β1 + ε1i
Density estimation
Regression est.
y2i = I(x⊤
2i β2 + ε2i > 0),
Semiparametrics
only data with y2i = 1 are observed and ε2i ∼ N (0, 1), then
Discrete choice
Censored data
Introduction E(y1i |x1i , x2i , y2i = 1) =
Truncation
Tobit model = x⊤
1i β1 + ρE(ε2i |x1i , x2i , y2i = 1)
MLE
Tobit – interpretation = x⊤
1i β 1 + ρE(ε 2i |x 1i , x 2i , x ⊤
2i β2 + ε2i > 0)
Specification
Alternatives = x⊤
1i β 1 + ρE(ε 2i |x 1i , x 2i , ε 2i > −x ⊤
2i β2 )
Two-part Tobit
Application
⊤ φ(x⊤ 2i β2 ) ⊤ ⊤
Alternatives = x1i β1 + ρ ⊤
= x 1i β 1 + ρλ(−x 2i β2 ),
⊲ Sample selection
Two-step estimation
Φ(x2i β2 )
MLE
Example
where λ(−t) = φ(t)/Φ(t) is the inverse Mills ratio
Application
Final thoughts
Introduction
Heckman (1976): two-step procedure similar to Tobit type II
Estimation
Binary choice
• estimate binary-choice model (probit) using all data
Density estimation
Regression est.
P (y2i = 1|x2i ) = Φ(x⊤
2i β2 )
Semiparametrics
Introduction
Maximum likelihood estimation
Estimation
X
2
Binary choice
log Ln (β, σ11 , σ12 ) = log P (y2i = 0|x2i )
Density estimation
y2i =0
Regression est. X
Semiparametrics
+ log[P (y2i = 1|x2i ) · f (y1i |y2i = 1, x1i , x2i )]
Discrete choice
y2i =1
Censored data
Introduction
Truncation
• MLE estimation possible using the Bayes rule
Tobit model
MLE f (y1i |...)
Tobit – interpretation f (y1i |y2i = 1, ...) = P (y2i = 1|y1i , ...) ·
Specification P (y2i = 1|...)
Alternatives
Two-part Tobit
Application which implies
Alternatives
Sample selection
Two-step estimation P (y2i = 1|...)f (y1i|y2i = 1, ...) = P (y2i = 1|y1i , ...)f (y1i|...)
⊲ MLE
Example
Application
Final thoughts
Introduction
Maximum likelihood estimation
Estimation
X
2
Binary choice
log Ln (β, σ11 , σ12 ) = log P (y2i = 0|x2i )
Density estimation
y2i =0
Regression est. X
Semiparametrics
+ log[P (y2i = 1|y1i , x1i , x2i ) · f (y1i |x1i , x2i )]
Discrete choice
y2i =1
Censored data
Introduction
Truncation
• conditional distribution of y2i = x⊤ ⊤
2i β2 + ε2i |ε1i = y1i − x1i β1
Tobit model follows from the joint normality of (ε1i , ε2i ):
MLE
Tobit – interpretation
−2
Specification
Alternatives
ε2i |ε1i = y1i − x⊤ ⊤
1i β1 ∼ N (µ1 − σ12 σ11 (y1i − x1i β1 − µ2 ),
−1
Two-part Tobit
Application
σ22 − σ12 σ22 σ21 )
Alternatives
Sample selection
Two-step estimation
where σ22 = 1 and
⊲ MLE
Example ε1i µ1 σ11 σ12 σ11 σ12
Application ∼N , = N 0,
ε2i µ2 σ21 σ22 σ21 1
Final thoughts
Introduction
Maximum likelihood estimation
Estimation
Binary choice 1 y1i −x⊤
1i β1
f (y1i |x1i , x2i ) = φ
Density estimation σ11 σ11
Regression est.
P (y2i = 0|x1i , x2i ) = 1 − Φ(x⊤ 2i β2 )
Semiparametrics
Discrete choice x⊤ β + σ σ −1 (y − x⊤ β )
2i 2 q12 11 i1 i1 1
Censored data P (y2i = 1|y1i , x1i , x2i ) = Φ
2 σ −2
Introduction
Truncation
1 − σ12 11
Tobit model
MLE 2 σ −2 )
• log-likelihood contribution (denoting σc2 = 1 − σ12
Tobit – interpretation 11
Specification
Alternatives
Two-part Tobit li (β, σ11 , σ12 ) = (1 − y2i ) log{1 − Φ(x⊤ 2i β2 )}
Application ⊤ −1 ⊤
Alternatives x2i β2 + σ12 σ11 (yi1 − xi1 β1 )
Sample selection + y2i log Φ 2 σ −2 )1/2
Two-step estimation (1 − σ12 11
⊲ MLE
Example y1i − x⊤ 1i β1
Application + log φ − log(σ11 )
Final thoughts
σ11
Econometrics Slide 218
Example: female wage equation
Introduction
Married women labor force participation (Mroz, 1987):
Estimation
Binary choice
• wage offer = observed only for those who choose to work
Density estimation • explanatory variables for the wage equation
Regression est.
Semiparametrics
◦ education
Discrete choice ◦ labor market experience
Censored data
Introduction • additional explanatory variables for the participation equation
Truncation
Tobit model
MLE ◦ non-wife income
Tobit – interpretation
Specification ◦ age
Alternatives
Two-part Tobit ◦ number of children
Application
Alternatives
Sample selection
Two-step estimation
MLE
⊲ Example
Application
Final thoughts
Introduction
Introduction
Introduction
Final thoughts
Introduction
Buchinsky (1998) The dynamics of changes in the female wage
Estimation
distribution in the USA: a quantile regression approach. Journal of
Binary choice
Applied Econometrics 13, 1–30.
Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data
Introduction
Truncation
Tobit model
MLE
Tobit – interpretation
Specification
Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
⊲ Application
Final thoughts
Estimation
Binary choice
Density estimation
Regression est.
Semiparametrics
Discrete choice
Introduction
Maximum likelihood
Estimation
Binary choice
• applicable in many nonlinear models
Density estimation • distributional assumptions necessary
Regression est.
Final thoughts
• count data
⊲ MLE
Semiparametrics Examples of MLE – partially discrete, partially continuous responses
Microeconomic data
The end
• Tobit, two-part Tobit, Tobit type II (sample selection)
• models with random censoring
Applications
• extensions needed (non-constant thresholds, random censoring,
random coefficients, sample selection probit, endogeneity, ...)
Introduction
Nonparametric estimation
Estimation
Binary choice
• very flexible, minimal assumptions
Density estimation • not easily applicable directly in models with several/many
Regression est. explanatory variables
Semiparametrics
Introduction
Choices such as
Estimation
Binary choice
• structural versus reduced-form analysis
Density estimation • parametric versus semiparametric estimation
Regression est.
• approach to semiparametric estimation
Semiparametrics
• software package and numerical tools
Discrete choice
Introduction
Estimation
Binary choice
Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
MLE
Semiparametrics
Microeconomic data
⊲ The end