Microeconometrics

MicroEconometrics
Pavel Čížek
(P.Cizek@uvt.nl)
Fall 2016
Econometrics Slide 1
Introduction
Introduction
⊲ Introduction
Martin Salm (Room K 642, Email: M.Salm@uvt.nl)
Course structure Pavel Čížek (Room K 641, Email: P.Cizek@uvt.nl)
Outline
Reduced models
Structural models
• MicroEconometrics
Estimation
◦ linear and nonlinear regression models
Binary choice
Density estimation
◦ estimation techniques for single-equation models
Regression est.
Semiparametrics
• Main book:
Discrete choice
◦ A. C. Cameron and P. K. Trivedi (2005) Microeconometrics:
Censored data
Methods and Applications, Cambridge University Press.
Final thoughts
• Classes: 14 lectures + 4 assignments

• Exam: 80% written exam, 20% assignments
Course structure
Introduction
Econometric models
Introduction
⊲ Course structure
Outline • linear models and causality
Reduced models
Structural models • duration models
Estimation • nonlinear models for discrete or limited responses
Binary choice
Density estimation Methodology

Regression est.
Semiparametrics
• linear models and their estimation
Discrete choice • maximum likelihood and generalized method of moments
Censored data • non- and semiparametric estimation
Final thoughts
Outline
Introduction
Introduction
Topics in the second half of the course
Course structure
⊲ Outline • parameter estimation of reduced-form models
Reduced models
Structural models
◦ including binary-choice models
Estimation
Binary choice
• nonparametric and semiparametric estimation
Density estimation
Regression est. ◦ including binary-choice models

Semiparametrics
Discrete choice • discrete and limited responses models

Censored data • sample selection models
Final thoughts
Reduced-form models
Introduction
Introduction
Estimation methodology designed for the reduced form models with
Course structure given statistical properties; for example,
Outline
⊲ Reduced models
Structural models yi = x⊤
i β + εi
Estimation
Binary choice
Density estimation • basic assumption E(εi |xi ) = 0

Regression est. • full rank condition E(xi x⊤
i )>0
Semiparametrics
• the model does not restrict what is yi and what is xi ;
Discrete choice
economic modeling determines what is yi and what is xi
Censored data
Final thoughts Without further structure, it is possible to study E(yi |xi ) = x⊤

i β,
that is,
• estimate parameters β
• test their significance or hypothesis regarding β
• state claims about correlations, not causality
Structural models
Introduction
Models can have not only statistical, but also economic structure
Introduction
Course structure describing how economic behavior, institutions, and laws affect
Outline
Reduced models relationship between variables yi and xi ; for example,
⊲ Structural models
Estimation
• the ith firm’s production yi , labor input li , and capital ki can be
Binary choice related by the (deterministic) Cobb-Douglas production function:
Density estimation
Regression est. yi = Ai · liα · kiβ

Semiparametrics
Discrete choice
◦ α and β interpreted as elements of the production function
Censored data ◦ the economic validity can be studied: are firms operating
Final thoughts efficiently under state ownership or under regulators
• reduced form: ln yi = ln A + α ln li + β ln ki + εi , where εi
◦ captures measurement errors

◦ contains an unobservable part of the technology Ai
Is this the only possible reduced form?

Reduced-form versus structural models
Introduction
Introduction
Structural models
Course structure
Outline • relate to economic theory
Reduced models
⊲ Structural models • facilitate interpretation
Estimation
Reduced-form models
Binary choice
Density estimation • account for heterogeneity

Regression est. • facilitate estimation
Semiparametrics
Discrete choice
Identification
Censored data • For a given structural model, is there only one reduced-form
Final thoughts
model?
• Is a given reduced-form model corresponding to multiple
structural models?
• Are all considered structural models rendering the same values
of (some) parameters in the reduced-form model?
(e.g., consider variation in Ai [in]dependent of i, li , or ki )
Introduction
⊲ Estimation
Method of moments
GMM
Maximum likelihood
General MLE
Comparison
Quasi-MLE
Quantile regression
Asymptotics
Binary choice Parametric estimation methods

Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Method of moments - linear regression
Introduction
Random sample (x1 , y1 ), . . . , (xn , yn ) following model
Estimation
⊲ Method of moments
GMM yi = x⊤
i β + εi
Maximum likelihood
General MLE
Comparison
Quasi-MLE • first conditional moment of εi : E(εi |xi ) = 0
Quantile regression
Asymptotics • unconditional moment equation:
Binary choice
xi E(εi |xi ) = 0 ⇒ E{E(xi εi |xi )} = E(xi εi ) = 0
Density estimation
• population equation: E(xi εi ) = E{xi (yi − x⊤
i β)} = 0
Regression est.
• sample analog equation (to be solved):
Semiparametrics
Pn
Discrete choice n−1 Pn
i=1 x i (y i − x ⊤ β) =
i P
−1 −1 n ⊤β = 0
Censored data n i=1 x i yi − n i=1 x i x i
Final thoughts
• solution
" n
#−1 " n
#
X X
β̂n = n−1 xi x⊤
i n−1 xi yi
i=1 i=1
Generalized method of moments
Introduction
Generalized method of moments (GMM): define
Estimation
Method of moments • data wi (e.g., wi = (yi , xi )⊤ = (yi , ki , li )⊤ ) of sample size n
⊲ GMM
Maximum likelihood • parameters of interest θ ∈ Θ ⊆ Rp ; its true value θ0 solves
General MLE
Comparison • moment conditions g(wi , θ) : Rp → Rk such that
Quasi-MLE
Quantile regression
Asymptotics
E{g(wi , θ0 )} = 0
Binary choice
Density estimation
(e.g., g(wi , θ) = xi (yi − x⊤
i θ) for E[x i (yi − x ⊤ θ)] = 0)
i
Regression est.
Semiparametrics
The GMM estimator minimizes wrt. θ
Discrete choice " n
#⊤ " n
#
Censored data 1 X 1 X
Qn (θ) = g(wi , θ) Wn g(wi , θ)
Final thoughts
n n
i=1 i=1
• k × k weighting matrix Wn , possibly random (estimated)
Maximum likelihood estimation
Introduction
Linear regression model: yi = x⊤ 2
Estimation
i β0 + εi , where εi ∼ N (0, σ ) iid
Method of moments
GMM
• distribution of yi is known conditionally on xi :
⊲ Maximum likelihood
2
General MLE
Comparison yi = x⊤
i β 0 + ε i ⇒ yi |x i ∼ N (x ⊤
i β 0 , σ )
Quasi-MLE
Quantile regression
Asymptotics • the likelihood contribution is the value of the density φ of yi |xi
Binary choice
• the likelihood is the conditional density of {yi }n n
i=1 given {xi }i=1
Density estimation
Regression est.
Ln (β, σ 2 ) = f (y1 , . . . , yn |x1 , . . . , xn ; β, σ 2 )
Semiparametrics Yn
Discrete choice = φ(yi |xi ; β, σ 2 )
Censored data
Xi=1n
2
ln Ln (β, σ ) = ln φ(yi |xi ; β, σ 2 )
Final thoughts i=1
(densities of N (0, σ 2 ) and N (0, 1) are denoted φσ (·) and φ(·))
Introduction
• the normal density function of yi |xi ∼ N (x⊤ 2
Estimation
i β, σ )
Method of moments 2

GMM
2 1 (yi − x⊤
i β)
⊲ Maximum likelihood φ(yi |xi ; β, σ ) = √ exp −
General MLE
2πσ 2 2σ 2
Comparison
Quasi-MLE
Quantile regression
• the log-likelihood function for yi |xi ∼ N (x⊤ 2
Asymptotics i β, σ )
Binary choice n
X
Density estimation
ln Ln (β, σ 2 ) = l(yi |xi ; β, σ 2 )
Regression est.
i=1
Semiparametrics
Xn
Discrete choice
= ln φ(yi |xi ; β, σ 2 )
Censored data
i=1
Final thoughts n
X 2

1 (yi − x⊤ β)
= − ln(2π) + ln(σ 2 ) + i
2 σ2
i=1
• maximizing this objective function is equivalent to least squares

Maximum likelihood estimation – general case
Introduction
Maximum likelihood estimation (MLE): define
Estimation
Method of moments • data wi = (yi , xi )⊤ (e.g., (yi , xi )⊤ = (yi , ki , li )⊤ ) of size n
GMM
Maximum likelihood • assume the true conditional density fy (yi |xi ; θ0 ) is known up to
⊲ General MLE
Comparison some parameters of interest θ ∈ Θ ⊆ Rp
(e.g., θ = (β, σ 2 )⊤ based on the model parameters β and
Quasi-MLE
Quantile regression
Asymptotics
the parameters of the density of errors εi )
Binary choice
Density estimation • identification: true parameter θ0 maximizes E[ln f (yi |xi ; θ)]
Regression est.
Semiparametrics
• log-likelihood for observations i: l(wi , θ) = ln f (yi |xi ; θ)
Discrete choice Pn
• log-likelihood function: ln Ln (θ) = n−1 i=1 l(wi , θ)
Censored data
Final thoughts
• maximum likelihood estimate
n
X
1
θ̂n = arg maxθ ln Ln (θ) = arg maxθ l(wi , θ)
n
i=1
Comparison
Introduction
Assuming the correct moment equations, GMM is
Estimation
Method of moments • consistent and asymptotically normal
GMM
Maximum likelihood • linear regression: E(εi |xi ) = 0
General MLE
⊲ Comparison
Quasi-MLE
Assuming the correct parametric distribution, MLE is
Quantile regression
Asymptotics • consistent, asymptotically normal, and
Binary choice can be asymptotically efficient with as. variance I −1 (θ), where
Density estimation I(θ) = E[−∂ ln f (yi |xi ; θ)/∂θ∂θ⊤ ]
• linear regression: εi ∼ N (0, σ 2 )
Regression est.
Semiparametrics
Discrete choice Assumptions can be

Censored data
• relatively weak in the case of GMM
Final thoughts
• rather strict in the case of MLE
Inference – in both cases, one can apply, for instance
• the Wald, Likelihood Ratio, Lagrange multiplier, and t-test
Quasi-maximum likelihood estimation
Introduction
Linear regression model: yi = x⊤ 2
Estimation
i β0 + εi , where εi ∼ N (0, σ ) iid
Method of moments
GMM
• distribution of yi is known conditionally on xi :
Maximum likelihood
2
General MLE
Comparison yi = x⊤
i β 0 + ε i ⇒ yi |x i ∼ N (x ⊤
i β 0 , σ )
⊲ Quasi-MLE
Quantile regression
Asymptotics • the likelihood is the conditional density of {yi }n n
i=1 given {xi }i=1
Binary choice
Xn
2
Density estimation
ln Ln (β, σ ) = ln φ(yi |xi ; β, σ 2 )
Regression est.
i=1
Semiparametrics
• quasi-MLE estimator solves the first-order conditions
Discrete choice
Censored data
∂ ln Ln (β, σ 2 ) Xn ∂ ln φ(yi |xi ; β, σ 2 )
Final thoughts = =0
∂(β ⊤ , σ 2 )⊤ i=1 ∂(β ⊤ , σ 2 )⊤
◦ have a form of moment conditions

◦ defines estimator consistent without εi ∼ N (0, σ 2 ) iid
Introduction
• the normal density function of yi |xi ∼ N (x⊤ 2
Estimation
i β, σ )
Method of moments 2

GMM
2 1 (yi − x⊤
i β)
Maximum likelihood φ(yi |xi ; β, σ ) = √ exp −
General MLE
2πσ 2 2σ 2
Comparison
⊲ Quasi-MLE
Quantile regression
• the log-likelihood function for yi |xi ∼ N (x⊤ 2
Asymptotics i β, σ )
Binary choice n
X 2

1 (yi − x⊤
i β)
Density estimation
ln Ln (β, σ 2 ) = − ln(2π) + ln(σ 2 ) +
Regression est. 2 σ2
i=1
Semiparametrics
Discrete choice
• the first-order conditions
Censored data
n
Final thoughts ∂ ln Ln (β, σ 2 ) 1 X
= 2 (yi − x⊤
i β)xi = 0
∂β σ
i=1
• solving this equation is equivalent to least squares estimator
• the Laplace density function of yi |xi ∼ DExp(x⊤

Introduction
Estimation
i β, 1)
Method of moments
GMM 1 n o
2 ⊤
Maximum likelihood φ(yi |xi ; β, σ ) = exp −|yi − xi β|
General MLE 2
Comparison
⊲ Quasi-MLE
Quantile regression • the log-likelihood function for yi |xi ∼ DExp(x⊤
i β, 1)
Asymptotics
n
X
Binary choice
2 1
Density estimation ln Ln (β, σ ) = ln − yi − x⊤ β
i
2
Regression est. i=1
Semiparametrics
Discrete choice • the first-order conditions (defining quasi-MLE)

n
Censored data
∂ ln Ln (β, σ 2 ) X 1
− I(yi − x⊤
Final thoughts
=2 i β ≥ 0) xi = 0
∂β 2
i=1
• this objective function defines least absolute deviation estimator
Quantile regression
Linear regression model: yi = x⊤

Introduction
i β0 + εi
Estimation
Pn ⊤ β)2 is consistent if E(ε |x ) = 0
Method of moments
GMM
• least squares i=1 (y i − x i i i
Maximum likelihood ⇒ identifies expectation E(yi |xi ) = x⊤i β
General MLE
Comparison • least absolute deviation estimator
minimizes
Quasi-MLE n
X 1 n
X 1
⊲ Quantile regression ⊤
Asymptotics yi − x⊤
i β = − I(yi − x ⊤
i β ≤ 0) yi − x i β
2 2
Binary choice i=1 i=1
Density estimation
Regression est.
and is consistent if med(εi |xi ) = 0
Semiparametrics ⇒ identifies conditional median med(yi |xi ) = x⊤
i β
Discrete choice • quantile regression estimator minimizes
Censored data
n
X
⊤
Final thoughts |τ − I(yi − x⊤
i β ≤ 0)| yi − x i β
i=1
and is consistent if Qτ (εi |xi ) = 0

⇒ identifies conditional quantile Qτ (yi |xi ) = x⊤
i β
Quantile regression
Introduction
Quantile regression (QR) in linear regression model
Estimation
Method of moments • QR is based on assumption Qτ (εi |xi ) = 0
GMM
Maximum likelihood • QR identifies conditional quantile Qτ (yi |xi ) = x⊤
i β
General MLE
Comparison • QR estimator minimizes
Quasi-MLE n
X
⊲ Quantile regression
Asymptotics ρτ (yi − x⊤
i β)
Binary choice i=1
Density estimation
Regression est.
where the check function ρτ (z) = [τ − I(z < 0)] · z
Semiparametrics • case of τ = 1/2: median regression or least absolute deviation
Discrete choice estimation as ρτ (z) = |z|/2
Censored data
Final thoughts
Quantile regression – simulated examples
Introduction
Estimation
Method of moments
Normal data Log-normal data
20
4
GMM
Maximum likelihood
15
2
General MLE
10
0
Comparison Y
Y
Quasi-MLE
5
-2
0
Asymptotics
-4
Binary choice -3 -2 -1 0 1 2 -3 -2 -1 0 1 2
X X
Density estimation
Normal data with quantiles Log-normal data with quantiles
Regression est.
20
4
15
Semiparametrics
2
10
Discrete choice
0
Y
Y
Censored data
5
-2
Final thoughts
0
-4
-3 -2 -1 0 1 2 -3 -2 -1 0 1 2
X X
Quantile regression – simulated examples
Introduction
Estimation
Method of moments
Normal data Heteroscedastic data
GMM
2
Maximum likelihood
5
General MLE
Comparison
0
Quasi-MLE
Y
Y
0
-2
Asymptotics
-4
Binary choice
-5
Density estimation
-6
-2 0 2 4 -2 0 2 4
Regression est.
X X
Semiparametrics Normal data with quantiles Heteroscedastic data with quantiles

5
Discrete choice
5
Censored data
Final thoughts
0
Y
Y
0
-5
-5
-2 0 2 4 -2 0 2 4
X X
Quantile regression: Engel curve
Introduction
Estimation foreach q of numlist 0.25 0.5 0.75 {

Method of moments
GMM
qreg foodexp income, quantile(‘q’)
Maximum likelihood }
General MLE
Comparison -------------------------------------------------
Quasi-MLE
⊲ Quantile regression qr <- rq(foodexp ~ income),tau=c(0.25,0.5,0.75))
Asymptotics
summary(qr)
Binary choice
Density estimation
Regression est.
Semiparametrics
Discrete choice
Coefficients:
Censored data
Final thoughts tau= 0.25 tau= 0.50 tau= 0.75

(Intercept) 95.4835396 81.4822474 62.3965855
income 0.4741032 0.5601806 0.6440141
Quantile regression: Engel curve
Introduction
Estimation
Method of moments
GMM
2000
Maximum likelihood
General MLE
Comparison
Quasi-MLE
⊲ Quantile regression 1500
Asymptotics
Binary choice
Food Expenditure
Density estimation
Regression est.
1000
Semiparametrics
Discrete choice
Censored data
500
Final thoughts
mean (LSE) fit
median (LAE) fit
1000 2000 3000 4000 5000
Household Income
Quantile regression
If in the linear regression model yi = x⊤

Introduction
Estimation
i β0 (τ ) + εi with
Method of moments Qτ (εi |xi ) = 0
GMM
Maximum likelihood • data form random sample (yi , xi )n
i=1
General MLE
Comparison • conditional distribution functions Fi (yi |xi ) are absolutely
Quasi-MLE
Quantile regression continuous with continuous densities fi (yi |xi ) uniformly
⊲ Asymptotics
bounded away from 0 and ∞ at Qτ (yi |xi ), i = 1, . . . , n
Binary choice
Density estimation
• matrices D0 = E(xi x⊤
i ) and
Regression est. D1 (τ ) = E{f (Qτ (yi |xi ))xi x⊤
i } are positive definite
Semiparametrics QR
then the quantile regression estimator β̂n (τ ) is consistent and
Discrete choice
Censored data √ QR
d
Final thoughts
n β̂n (τ ) − β0 (τ ) → N (0, τ (1 − τ )D1−1 D0 D1−1 )
Quantile regression
Introduction
Buchinsky (1998) Recent Advances in Quantile Regression Models:
Estimation
Method of moments
A Practical Guideline for Empirical Research. The Journal of Human
GMM Resources 33(1), 88–126.
Maximum likelihood
General MLE
Comparison • properties of quantile regression estimator
Quasi-MLE
Quantile regression • computation of quantile regression estimator
⊲ Asymptotics
• inference and tests based on quantile regression
Binary choice
(test of homoscedasticity, symmetry,
Density estimation
Regression est.
• application to Current Population Survey data (1973–1993)
Semiparametrics • censored quantile regression (discussed later)
Discrete choice
Censored data
Final thoughts
Introduction
Estimation
⊲ Binary choice
Binary choice
Probit and logit
MLE
Marginal effects
Measures of fit
Application
Heteroscedasticity
Simulation Binary choice models
Semiparametrics
MSC
Single index
Semiparametric LS
Klein and Spady
Implementation
Average derivative
Outlook
Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Introduction to binary choice models
Introduction
Binary choice = binary response: a single discrete decision that can
Estimation
be characterized by values 0 and 1
Binary choice
⊲ Binary choice • traditionally, y = 1 = “yes, success” and y = 0 = “no, failure”
Probit and logit
MLE
• example: labor force participation, university education, foreign
Marginal effects
Measures of fit direct investment, public vs. private transport...
Application
Heteroscedasticity
Simulation Typically derived from a structural model for the latent variable y ∗
Semiparametrics
MSC
Single index
• y ∗ represents monetary utility, profit, ...
Semiparametric LS
Klein and Spady
• example: seller = price - purchasing value of an object,
Implementation
buyer = (monetary) utility from the object - price
Average derivative
Outlook (nontrivial in the cases of education, job, ...)
Density estimation
Regression est. Factors influencing the decision are available

Semiparametrics
• explanatory variables x = (x1 , . . . , xp )⊤
Discrete choice
Censored data
Final thoughts
Introduction
Typically derived from a structural model for the latent variable y ∗
Estimation
Binary choice • two choices a and b with choice characteristics za and zb

⊲ Binary choice
Probit and logit • individual characteristics w
• choice a: utility Ua = w ⊤ δa + za⊤ γa + ǫa
MLE
Marginal effects
• choice b: utility Ub = w ⊤ δb + zb⊤ γb + ǫb

Measures of fit
Application
Heteroscedasticity
Simulation
Semiparametrics Utilities Ua and Ub unobservable, only choice is observed:
MSC
Single index • choose a if Ua ≥ Ub ⇔ y ∗ = Ua − Ub ≥ 0: denoted y = 1
Semiparametric LS
Klein and Spady • choose b if Ua < Ub ⇔ y ∗ = Ua − Ub < 0: denoted y = 0
Implementation
Average derivative • Ua − Ub = w ⊤ (δa − δb ) + za⊤ γa − zb⊤ γb + ǫa − ǫb = x⊤ β + ε
Outlook
Density estimation
• regression model characterizes expectation (for individual i)
Regression est.
Semiparametrics
E(yi |xi ) = P (yi = 1|xi ) · 1 + P (yi = 0|xi ) · 0 = P (yi = 1|xi )
Discrete choice = P (yi∗ = Uia − Uib ≥ 0|xi )
= P (x⊤ ⊤ ⊤
Censored data
i β + ε i ≥ 0|x i ) = P (x i β + ε i ≥ 0|x i β)
Final thoughts
Introduction
What can be identified in
Estimation
Binary choice
• choice a: utility Ua = w ⊤ δa + za⊤ γa + ǫa ?
⊲ Binary choice
Probit and logit
• choice b: utility Ub = w ⊤ δb + zb⊤ γb + ǫb ?
MLE
Marginal effects The reduced-form model E(yi |xi ) = P (x⊤ i β + ε i ≥ 0|x ⊤ β)
i
Measures of fit
Application corresponds to the difference in utilities Ua − Ub :
Heteroscedasticity
Simulation
Semiparametrics y ∗ = w⊤ (δa − δb ) + za⊤ γa − zb⊤ γb + ǫa − ǫb = x⊤ β + ε
MSC
Single index
Semiparametric LS
Klein and Spady • δa and δb cannot be identified separately,
Implementation
Average derivative only their difference δa − δb can be identified
Outlook
• what about identification of γa and γb (often assumed γa = γb )
Density estimation
Regression est.
◦ if za and zb contain different variables/quantities
Semiparametrics
◦ if za and zb contain common variables/quantities
Discrete choice
Censored data
◦ do we have a choice?
Final thoughts
Probit and logit
Introduction
Suppose that the latent utility yi∗ follows the linear model
Estimation
Binary choice
Binary choice
yi∗ = x⊤
i β0 + εi , εi ∼ F
⊲ Probit and logit
MLE
Marginal effects
Measures of fit
• observed response is binary (decision, choice, success, ...):
Application
Heteroscedasticity
yi = I(yi∗ > 0) = I(x⊤ i β + εi > 0)
Simulation
Semiparametrics
• εi is symmetrically distributed and has zero mean: Eεi = 0
MSC
Single index
• identification by normalization: σ 2 = varεi = 1, for example
(yi = I(x⊤ ⊤ σ
i β + εi > 0) = I(xi β/σ + εi /σ > 0) = yi )
Semiparametric LS
Klein and Spady
Implementation • regression function if εi ∼ F :
Average derivative
Outlook
Density estimation E(yi |xi ) = 1 · P (yi = 1|xi ) + 0 · P (yi = 0|xi )

Regression est. P (yi = 1|xi ) = P (yi∗ > 0|xi ) = P (x⊤
i β0 + εi > 0|xi )
= P (x⊤ ⊤
Semiparametrics
i β 0 > −ε i |x i ) = F (x i β0 )
Discrete choice
Censored data [ = P (−x⊤

i β 0 < ε i |x i ) = 1 − F (−x ⊤
i β0 )]
Final thoughts
Probit and logit
Binary-choice model P (yi = 1|xi ) = F (x⊤

Introduction
Estimation
i β)
Binary choice • F is completely specified (does not depend on parameters)
Binary choice
⊲ Probit and logit • probit = F is the standard normal distribution
MLE
Marginal effects ˆ t
Measures of fit
Application F (t) ≡ Φ(t) = φ(s)ds
Heteroscedasticity −∞
Simulation
Semiparametrics
MSC (σ normalized to 1)
Single index
Semiparametric LS • logit = F is the (standard) logistic distribution with
Klein and Spady
Implementation
the location parameter 0 and scale parameter 1
Average derivative
Outlook exp(t) 1
Density estimation
F (t) ≡ Λ(t) = =
1 + exp(t) 1 + exp(−t)
Regression est.
√
Semiparametrics
(σ normalized to π/ 3 ≈ 1.814)
Discrete choice
Censored data
Final thoughts
MLE for probit and logit: P (yi = 1|xi ) = F (x⊤

Introduction
Estimation
i β), F known
Binary choice • identification requires also E(xi x⊤
i ) to be non-singular:
P (x⊤ ⊤ β ) > 0 implies P [F (x⊤ β) 6= F (x⊤ β )] > 0
Binary choice
Probit and logit i β =
6 x i 0 i i 0
⊲ MLE
Marginal effects
if F is strictly monotonic and completely specified
Measures of fit
Application
• likelihood contribution:
Heteroscedasticity
Simulation
Semiparametrics
L(β|yi , xi ) = P (yi = 1|xi )yi P (yi = 0|xi )1−yi
yi 1−yi
= F (x⊤ ⊤
MSC
Single index i β) {1 − F (xi β)}
Semiparametric LS
Klein and Spady
Implementation • log-likelihood contribution:
Average derivative
Outlook l(yi , xi ; β) = yi ln F (x⊤
i β) + (1 − yi ) ln{1 − F (x ⊤ β)}
i
Density estimation • log-likelihood function:
Regression est.
n
X
Semiparametrics
Discrete choice
ln Ln (β) = [yi ln F (x⊤ ⊤
i β) + (1 − yi ) ln{1 − F (xi β)}]
Censored data i=1
Final thoughts
Probit Φ(x⊤ ⊤
i β̂n ) and logit Λ(xi β̂n ): coronary heart disease
Introduction
probit chd age; predict probit, p
Estimation
--------------------------------------------------
Binary choice
Binary choice z <- glm(chd ~ age,family=binomial(link="probit"))
Probit and logit
⊲ MLE
z$fitted.values
Marginal effects
Measures of fit
1
Application
Heteroscedasticity
Simulation
.8
Semiparametrics
MSC
.6
Single index
Semiparametric LS
Klein and Spady
.4
Implementation
Average derivative
Outlook
.2
Density estimation
0
Regression est.
10 20 30 40 50 60 70
Semiparametrics Age (in years)
Discrete choice Evidence of coronary heart disease (1=yes, 0=no) Probit

Logit
Censored data
Final thoughts
Probit and logit: coronary heart disease data
Introduction
Estimation probit chd age

Binary choice ---------------------------------------------------
Binary choice
Probit and logit
chd | Coef. Std. Err. z P>|z|
⊲ MLE
Marginal effects
-------------+-------------------------------------
Measures of fit age | .0651086 .0133894 4.86 0.000
Application
Heteroscedasticity _cons | -3.117323 .624082 -5.00 0.000
Simulation
Semiparametrics
---------------------------------------------------
MSC
Single index
Semiparametric LS logit chd age
Klein and Spady
Implementation ---------------------------------------------------
Average derivative
Outlook
Density estimation
-------------+-------------------------------------
Regression est. age | .109732 .0242318 4.53 0.000
Semiparametrics _cons | -5.259665 1.139163 -4.62 0.000
Discrete choice ---------------------------------------------------
Censored data Ratio of slope coefficients: 1.685
Final thoughts
Interpretation – marginal effects
Introduction
∂F (x⊤ β)
Estimation
∂P (yi = 1|xi = x)
Binary choice pj (x) = = = f (x⊤ β)βj
Binary choice ∂xij ∂xij
Probit and logit
MLE
⊲ Marginal effects (pj (x) is auxiliary/temporary notation)
Measures of fit
Application
Heteroscedasticity
• pj (x) depends on f = F ′ , but ratios pj (x)/pk (x) do not
Simulation
Semiparametrics
• probit: pj (x) = φ(x⊤ β)βj (= 0.399βj at x⊤ β = 0)
MSC
Single index
• logit: pj (x) = λ(x⊤ β)βj (= 0.25βj at x⊤ β = 0)
Semiparametric LS
Klein and Spady
Implementation
Marginal effects
Average derivative Pn
Outlook • reported at average i=1 xi /n or
Pn ⊤
average marginal effects i=1 f (xi β̂n )β̂nj /n
Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Interpretation – marginal effects: coronary heart disease
Introduction
Estimation probit chd age;margins, dydx(age) at(age=(20 40 60))

Binary choice --------+----------------------------------------
Binary choice
Probit and logit
age 20 | .0050015 .0024204 2.07 0.039
MLE
⊲ Marginal effects
at 40 | .0227723 .0041655 5.47 0.000
Measures of fit 60 | .0190241 .0023136 8.22 0.000
Application
Heteroscedasticity -------------------------------------------------
Simulation
Semiparametrics
MSC logit chd age; margins, dydx(age) at(age=(20 40 60))
Single index
Semiparametric LS --------+----------------------------------------
Klein and Spady
Implementation age 20 | .0046731 .001931 2.42 0.016
Average derivative
Outlook
at 40 | .0228294 .0042952 5.32 0.000
Density estimation
60 | .0182116 .0025315 7.19 0.000
Regression est. -------------------------------------------------
Semiparametrics
Discrete choice
Censored data
Final thoughts
Measures of fit
Introduction
How to measure fit in the binary-choice models?
Estimation Pn
Binary choice
• percentage correctly predicted (PCP) = i=1 I[yi = ŷi ]/n,
Binary choice
where ỹi = I{F (x⊤
i bn ) > 0.5}
Probit and logit
MLE
Marginal effects ◦ misleading if one response rarely observed
⊲ Measures of fit
Application ◦ threshold 0.5 not suitable if P (yi = 1|xi ) is always low/high
Heteroscedasticity
Simulation
Semiparametrics • pseudo-R2 = 1 − ln Ln (β̂n )/ ln Ln ((1, 0, . . . , 0)⊤ )
MSC
Single index • other measures exist, but interpretation is more important:
Semiparametric LS
Klein and Spady
Implementation ◦ marginal effects at average pj (x̄)
Average derivative Pn
Outlook ◦ average marginal effects i=1 pj (xi )/n
Density estimation ◦ correct predictions per categories (yi = 1 and yi = 0)
Regression est.
◦ cross-tabulation of yi versus ŷi
Semiparametrics
Discrete choice
Censored data
Final thoughts
Example: coronary heart disease data
Introduction
Estimation . estat classification

Binary choice -------- True --------
Binary choice
Probit and logit
Classified | D ~D | Total
MLE
Marginal effects
-----------+--------------------------+-----------
⊲ Measures of fit + | 28 12 | 40
Application
Heteroscedasticity - | 14 45 | 59
Simulation
Semiparametrics
-----------+--------------------------+-----------
MSC Total | 42 57 | 99
Single index
Semiparametric LS --------------------------------------------------
Klein and Spady
Implementation False + rate for true ~D Pr( +|~D) 21.05%
Average derivative
Outlook
False - rate for true D Pr( -| D) 33.33%
Density estimation
False + rate for classified + Pr(~D| +) 30.00%
Regression est. False - rate for classified - Pr( D| -) 23.73%
Semiparametrics --------------------------------------------------
Discrete choice Correctly classified 73.74%
Censored data --------------------------------------------------
Final thoughts
Application
Introduction
A. van Soest (1995) Structural models of family labor supply, Journal
Estimation
of Human Resources 30(1), 63–88.
Binary choice
Binary choice
Probit and logit
• model of labor supply of couples forming households
MLE
Marginal effects
• labor supply of man and woman discretized (25–36 choices)
Measures of fit
⊲ Application
• imperfectly predictable wages and hours restrictions
Heteroscedasticity implemented
Simulation
Semiparametrics • estimation via simulated maximum likelihood
MSC
Single index
Semiparametric LS
Klein and Spady
Implementation
Average derivative
Outlook
Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Distributional assumptions and heteroscedasticity
Introduction
Latent linear model with heteroscedasticity:
Estimation
Binary choice
Binary choice yi∗ = x⊤
i β + εi
Probit and logit
MLE
Marginal effects • conditional mean E(εi |xi ) = 0
Measures of fit
Application • conditional variance var(εi |xi ) = var(εi |xi0 ) 6= const. σ 2
⊲ Heteroscedasticity
Simulation
Semiparametrics ◦ generally unknown function of xi0 = xi without intercept
MSC
Single index ◦ parametric form can be assumed to facilitate estimation, e.g.,
Semiparametric LS
Klein and Spady
Implementation var(εi |xi ) = exp(α + x⊤
i0 γ)
Average derivative
Outlook
Density estimation • estimation under heteroscedasticity: var(εi |xi0 ) 6= const. σ 2

Regression est.
Semiparametrics
◦ linear-regression model: ordinary LS consistent
Discrete choice ◦ binary-choice model (εi ∼ N (0, exp(x⊤
i0 γ)) with α = 0):
Censored data “homoscedastic” maximum likelihood inconsistent if γ 6= 0
Final thoughts
Simulated linear regression
Introduction
Estimation set obs 200 | n <- 200

Binary choice |
Binary choice
Probit and logit
gen eps = rnormal() | eps <- rnorm(n)
MLE
Marginal effects
gen x = 5*runiform() | x <- 5*runif(n)
Measures of fit gen y = -2+x+eps | y <- -2+x+eps
Application
Heteroscedasticity gen yhet = -2+x+eps*x/5 | y <- -2+x+eps*x/5
⊲ Simulation
Semiparametrics
|
MSC reg y x | z <- lm(y ~ x)
Single index
Semiparametric LS predict linpred, pr | linpred <- z$fitted.values
Klein and Spady
Implementation reg yhet x | z <- lm(yhet ~ x)
Average derivative
Outlook
predict hetpred, pr | hetpred <- z$fitted.values
Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Simulated linear regression
Introduction
Estimation
4
Binary choice
Binary choice
Probit and logit
MLE
Marginal effects 2
Measures of fit
Application
Heteroscedasticity
0
⊲ Simulation
Semiparametrics
MSC
−2
Single index
Semiparametric LS
Klein and Spady
Implementation
−4
Average derivative
Outlook
0 1 2 3 4 5
Density estimation x
Regression est. y Fitted values

Semiparametrics y [hetero] Fitted values [hetero]
Discrete choice
Censored data
Final thoughts
Simulated probit regression
Introduction
Estimation set obs 200 | n <- 200

Binary choice |
Binary choice
Probit and logit
gen eps = rnormal() | eps <- rnorm(n)
MLE
Marginal effects
gen x = 5*runiform() | x <- 5*runif(n)
Measures of fit gen y = -2+x+eps > 0 | y <- -2+x+eps > 0
Application
Heteroscedasticity gen yhet = -2+x+eps*x/5 > 0 | y <- -2+x+eps*x/5 > 0
⊲ Simulation
Semiparametrics
|
MSC probit y x |
Single index
Semiparametric LS predict linpred, pr |
Klein and Spady
Implementation probit yhet x |
Average derivative
Outlook
predict hetpred, pr |
Density estimation
---------------------------------------------------
Regression est. z <- glm(y ~ x, family=binomial(link="probit"))
Semiparametrics linpred <- z$fitted.values
Discrete choice z <- glm(yhet ~ x, family=binomial(link="probit"))
Censored data hetpred <- z$fitted.values
Final thoughts
Simulated probit regression
Introduction
1
Estimation
Binary choice
Binary choice
.8
Probit and logit
MLE
Marginal effects .6
Measures of fit
Application
Heteroscedasticity
⊲ Simulation
.4
Semiparametrics
MSC
Single index
.2
Semiparametric LS
Klein and Spady
Implementation
0
Average derivative
Outlook
0 1 2 3 4 5
Density estimation x
Regression est. y Pr(y)

Semiparametrics y [hetero] Pr(y) [hetero]
Discrete choice
Censored data
Final thoughts
Heteroscedasticity – estimation
Binary-choice model if εi |xi ∼ N (0, exp(x⊤

Introduction
Estimation
i0 γ))
⊤ ⊤γ
Binary choice
P (yi = 1|xi ) = Φ{xi β/ exp(xi0 )}
Binary choice
Probit and logit
2
MLE
Marginal effects • xi0 does not contain intercept
Measures of fit
Application • x⊤
i β and x ⊤ γ could contain different variables
i0
Heteroscedasticity
⊲ Simulation • proof as for the standard probit:
Semiparametrics
MSC
P (yi = 1|xi ) = P (x⊤ ⊤
i β0 + εi > 0|xi ) = P (xi β0 > −εi |xi )
Single index
Semiparametric LS
= P (x⊤ ⊤ ⊤
i β0 / exp(xi0 γ/2) > −εi / exp(xi0 γ/2)|xi )
Klein and Spady = Φ{x⊤ i β 0 / exp(x ⊤ γ/2)}
i0
Implementation
Average derivative • more flexible than the standard probit (recall how to estimate!)
Outlook
Density estimation
• complicated marginal effects (interpretation!)
Regression est. ∂P (yi = 1|xi = x)
pj (x) =
Semiparametrics
∂xij
Discrete choice
⊤ ⊤γ ⊤γ γj ⊤
Censored data = φ{xi β exp(−xi0 )} · exp(−xi0 )[βj − (xi β)]
2 2 2
Final thoughts
Probit and heteroscedasticity: coronary heart disease data
Introduction
Estimation probit chd age

Binary choice ---------------------------------------------------
Binary choice
Probit and logit
MLE
Marginal effects
-------------+-------------------------------------
Measures of fit age | .0651086 .0133894 4.86 0.000
Application
Heteroscedasticity _cons | -3.117323 .624082 -5.00 0.000
⊲ Simulation
Semiparametrics
---------------------------------------------------
MSC hetprob chd age, het(age)
Single index
Semiparametric LS ---------------------------------------------------
Klein and Spady
Implementation chd | Coef. Std. Err. z P>|z|
Average derivative
Outlook
-------------+-------------------------------------
Density estimation
chd: age | .053903 .04073 1.32 0.186
Regression est. _cons | -2.60493 1.87666 -1.39 0.165
Semiparametrics -------------+-------------------------------------
Discrete choice lnsigma2:age | -.004551 .017336 -0.26 0.793
Censored data ---------------------------------------------------
Final thoughts
Semiparametric alternatives
Introduction
Maximum likelihood estimator
Estimation
Binary choice
• can be asymptotically normal and efficient
Binary choice
Probit and logit
• requires strict distributional assumptions
MLE
Marginal effects
• for example, it is inconsistent
Measures of fit
Application
Heteroscedasticity
◦ and highly sensitive to heteroscedasticity
Simulation
◦ and insensitive to misspecification of symmetric unimodal
⊲ Semiparametrics
MSC distribution function
Single index
Semiparametric LS
Klein and Spady
Implementation Semiparametric estimation
Average derivative
Outlook
• methods of estimation that do not rely
Density estimation
on parametric assumptions about the shape
Regression est.
of the error term distribution
Semiparametrics
Discrete choice
Censored data
Final thoughts
Maximum score estimation
Introduction
Maximum score estimator (MSE) by Manski (1985)
Estimation
Binary choice
n
1X
Binary choice
β̂nM SE
Probit and logit
MLE = arg maxβ yi I(x⊤
i β ≥ 0) + (1 − yi )I(x ⊤
i β < 0)
Marginal effects n
Measures of fit
i=1
Xn
Application
1
Heteroscedasticity
= arg minβ |yi − I(x⊤
i β > 0)|
Simulation
n
Semiparametrics i=1
⊲ MSC
Single index
Semiparametric LS
Klein and Spady • weak distributional assumptions (med(εi |xi ) = 0)
Implementation
Average derivative applicable under any F and unobserved heteroscedasticity
Outlook
• identification up to a scale as in probit, estimation of
Density estimation
med(yi |xi ) = medI(x⊤i β + ε i > 0) = I(x ⊤ β > 0)
i
Regression est.
Semiparametrics • slow convergence rate n1/3

Discrete choice • Horowitz (1992): smoothed MSE with rate n1/2−δ for δ > 0
Censored data
• mostly applied as auxiliary estimator (e.g., Hausman test...)
Final thoughts
Single-index models
Introduction
Single-index model:
Estimation
Binary choice
Binary choice
E(Yi |Xi = x) = g(x⊤ β)
Probit and logit
MLE
Marginal effects • covers linear models, binary-choice models, ...
Measures of fit
Application • restricts the form of heteroscedasticity
Heteroscedasticity
Simulation
Semiparametrics
Identification conditions, assuming unknown g : R → R
MSC
⊲ Single index • g is differentiable and not constant on the support of Xi⊤ β
Semiparametric LS
Klein and Spady • Xi has continuously distributed components and its support is
not contained in any proper linear subspace of Rp
Implementation
Average derivative
Outlook
• no intercept and β1 = 1 (location and scale normalization):
Density estimation
g ∗ (x⊤ β) = g(γ + δ · x⊤ β) if g ∗ (t) = g(γ + δt)
Regression est.
Semiparametrics
• coefficient values of discrete variables cannot divide the support
Discrete choice of Xi⊤ β into disjoint subsets (otherwise, g must not be periodic)
Censored data
Final thoughts
Semiparametric LS
Introduction
Semiparametric least squares: Ichimura (1993)
Estimation
Nonlinear least squares for E(yi |xi ) = g(x⊤
i β): g is known
Binary choice
Binary choice n
X
2
Probit and logit
MLE
min {yi − g(x⊤
i β)}
Marginal effects
β∈B
i=1
Measures of fit
Application
Heteroscedasticity Semiparametric least squares: g is unknown
Simulation
Semiparametrics • estimate the regression function
MSC
Single index g(x⊤
i β) = E(Y i |X ⊤ β = x⊤ β) = E(Y |x⊤ β) by ĝ (x⊤ β)
i i i i n i
⊲ Semiparametric LS
Klein and Spady • maximize the sum of least squares to get β̂n from
Implementation
Average derivative
n
X
Outlook
2
Density estimation
min {yi − ĝn (x⊤
i β)}
β∈B
Regression est.
i=1
Semiparametrics
Discrete choice
and then estimate ĝn (x⊤
i β̂n )
Censored data
Final thoughts
Klein and Spady
Introduction
Klein and Spady (1993): estimate F and maximize likelihood based
Estimation
on the estimated distribution function F̂n
Binary choice
Binary choice
Probit and logit
• binary response: F (Xi⊤ β) = P (Yi = 1|Xi⊤ ) = E(Yi |Xi⊤ β)
MLE
Marginal effects
• parametric log-likelihood function:
Measures of fit
Application n
X
Heteroscedasticity
Simulation
ln Ln (β) = [yi ln F (x⊤ ⊤
i β) + (1 − yi ) ln{1 − F (xi β)}]
Semiparametrics i=1
MSC
Single index
Semiparametric LS
⊲ Klein and Spady
• estimate F (x⊤ ⊤ ⊤
i β) = P (Yi = 1|xi β) = E(Yi |xi β)
Implementation by F̂n (x⊤
i β), maximize likelihood wrt. β
Average derivative
Outlook
n
X
[yi ln F̂n (x⊤ ⊤
Density estimation
Regression est.
i β) + (1 − yi ) ln{1 − F̂n (xi β)}]
i=1
Semiparametrics
to get β̂n and then estimate F̂n (x⊤

Discrete choice
Censored data
i β̂n )
Final thoughts
Implementation: Binary-choice model
Introduction
Estimation probit <- function(beta, x, y)

Binary choice {
Binary choice
Probit and logit
p <- pnorm(x %*% beta)
MLE
Marginal effects
logl <- -1 * (y == 0) * log(1 - p)
Measures of fit + -1 * (y == 1) * log(p)
Application
Heteroscedasticity
Simulation
Semiparametrics
return(sum(logl))
MSC }
Single index
Semiparametric LS
Klein and Spady
⊲ Implementation # assume the data are
Average derivative
Outlook
# Y is n x 1 vector for dependent variable
Density estimation
# X is n x p matrix for explanatory variables
Regression est.
Semiparametrics z <- optim(double(p+1), probit, x=X, y=Y,

Discrete choice method="BFGS")
Censored data print(c("Parameter estimates:",z$par))
Final thoughts
Implementation: Binary-choice model
Introduction
Estimation program define probit

Binary choice args lnf Xb
Binary choice
Probit and logit
MLE
Marginal effects
quietly replace ‘lnf’ = ln(normal(‘Xb’))
Measures of fit if $ML_y1==1
Application
Heteroscedasticity quietly replace ‘lnf’ = ln(normal(-‘Xb’))
Simulation
Semiparametrics
if $ML_y1==0
MSC end
Single index
Semiparametric LS
Klein and Spady
⊲ Implementation * assume the data are
Average derivative
Outlook
* y is he dependent variable
Density estimation
* x1, x2 are the explanatory variables
Regression est.
Semiparametrics ml model lf probit (y = x1 x2)

Discrete choice ml init 1 0 0, copy
Censored data ml maximize
Final thoughts
Implementation: Klein and Spady
Introduction
Estimation KS <- function(beta, x, y)

Binary choice {
Binary choice
Probit and logit
# originally p <- pnorm(x %*% beta)
MLE
Marginal effects
beta <- c(0, beta, 1)
Measures of fit p <- ksmooth(x%*%beta, y, x.points = x%*%beta)
Application
Heteroscedasticity logl <- -1 * (y == 0) * log(1 - p)
Simulation
Semiparametrics
+ -1 * (y == 1) * log(p)
MSC
Single index
Semiparametric LS return(sum(logl))
Klein and Spady
⊲ Implementation }
Average derivative
Outlook
Density estimation
# assume the data are
Regression est. # Y is n x 1 vector for dependent variable
Semiparametrics # X is n x p matrix for explanatory variables
Discrete choice z <- optim(double(p-1), KS, x=X, y=Y, method="BFGS")
Censored data print(c("Parameter estimates:",z$par))
Final thoughts
Implementation: Klein and Spady
Introduction
Estimation program define KS

Binary choice args lnf Xb
Binary choice
Probit and logit
tempvar prob
MLE
Marginal effects
Measures of fit lpoly $ML_y1 ‘Xb’, gen(‘prob’)
Application
Heteroscedasticity
Simulation
Semiparametrics
quietly replace ‘lnf’ = ln(‘prob’)
MSC if $ML_y1==1
Single index
Semiparametric LS quietly replace ‘lnf’ = ln(1-‘prob’)
Klein and Spady
⊲ Implementation if $ML_y1==0
Average derivative
Outlook
end
Density estimation
Regression est. ml model lf KS (y = x1 x2, noconst offset(x1))

Semiparametrics ml init 0, copy
Discrete choice ml maximize
Censored data
Final thoughts
Average derivative
Direct estimation of single index model E(Yi |Xi ) = g(Xi⊤ β)

Introduction
Estimation
(Powell, Stock, and Stoker, 1989; Härdle and Stoker, 1989)
Binary choice
Binary choice
Probit and logit • denoting m(x) = E(Yi |Xi = x) and f the density of Xi
MLE
Marginal effects
Measures of fit ′ ∂g(x ⊤ β)
′ ⊤
′
Application m (x) = = g (x β)β ⇒ E m (Xi ) = γβ
Heteroscedasticity ∂x
Simulation
Semiparametrics
ˆ ˆ
MSC
Single index
E{m′ (Xi )} = m′ (x)f (x)dx = − m(x)f ′ (x)dx
Semiparametric LS
′
′

f (x) f (Xi )
ˆ
Klein and Spady
Implementation = − m(x) f (x)dx = −E Yi
⊲ Average derivative f (x) f (Xi )
Outlook
Density estimation
• estimate f and f ′ by kernel density estimator
Regression est.
n
X yi fˆn (xi )
Semiparametrics ′
Discrete choice c =
γβ −
n
Censored data
i=1 nfˆn (xi )
Final thoughts
Outlook
Introduction
What are benefits of semiparametric procedures
Estimation
Binary choice
• Estimation under less restrictive assumptions
Binary choice
Probit and logit
• Ability to compute probabilities without distributional
MLE
assumptions: estimate
Marginal effects
Measures of fit
Application
Heteroscedasticity
F (Xi⊤ β) = P (Yi = 1|Xi⊤ β) = E(Yi |Xi⊤ β)
Simulation
Semiparametrics
MSC • Ability to compute marginal effects:
Single index
Semiparametric LS
estimate
Klein and Spady
Implementation
′ ∂P (Yi = 1|Xi⊤ β) ∂E(Yi |Xi⊤ β)
Average derivative
F (Xi⊤ β) = =
⊲ Outlook ∂Xi ∂Xi
Density estimation
Regression est. • Requires estimation of densities, distribution functions,

Semiparametrics conditional expectations, and their derivatives
Discrete choice
Censored data
Final thoughts
Outlook
Introduction
probit chd age | z <- glm(chd ~ age,
Estimation
predict ind, xb; | family=binomial(link="probit"))
Binary choice
Binary choice predict prob, pr | r <- locpoly(z$fitted.values, chd)
Probit and logit
MLE
lpoly chd ind, ci | lines(r)
Marginal effects addplot((line prob ind, sort))
Measures of fit
Application
Evidence of coronary heart disease (1=yes, 0=no)

Heteroscedasticity
Local polynomial smooth
Simulation
Semiparametrics 1
MSC
.8
Single index
Semiparametric LS
.6
Klein and Spady

Implementation
.4
Average derivative
⊲ Outlook
.2
Density estimation
0
Regression est.
−2 −1 0 1
Linear prediction
Semiparametrics
95% CI Evidence of coronary heart disease
Discrete choice Local smoother Pr(chd)
kernel = epanechnikov, degree = 0, bandwidth = .5, pwidth = .76
Censored data
Final thoughts
Application
Introduction
Gerfin (1996) Parametric and semiparametric estimation of the
Estimation
binary-response models of labor market participation. Journal of
Binary choice
Binary choice Applied Econometrics 11, 321–339.
Probit and logit
MLE • labor force participation of Swiss and German women
Marginal effects
Measures of fit • parametric and semiparametric estimators compared
Application
Heteroscedasticity
Simulation
Semiparametrics
MSC
Single index
Semiparametric LS
Klein and Spady
Implementation
Average derivative
⊲ Outlook
Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Introduction
Estimation
Binary choice
⊲ Density estimation
Introduction
Motivation
Histogram
Local histogram
Kernel estimator
Related methods Nonparametric density
Kernel and band.
Bias and variance estimation
Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
Confidence bands
Testing
Multivariate density
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Introduction
Introduction
• Parametric regression:
Estimation
Binary choice ◦ concentrates on estimating E(Y |X)

Density estimation
⊲ Introduction
◦ functional and distributional assumptions
Motivation
Histogram
Local histogram
• Semiparametric methods:
Kernel estimator
Related methods ◦ preserve parametric structure for parameters of interest
Kernel and band.
Bias and variance ◦ auxiliary parameters (e.g., the error distribution) are estimated
Bandwidth choice
Plug-in methods without specific parametric assumptions
Asymptotics
Confidence bands • Nonparametric function estimation: unconstrained
Testing
◦ density estimation
Regression est.
Semiparametrics
◦ regression estimation
Discrete choice ◦ curse of dimensionality
Censored data
Final thoughts
Motivation
Introduction
Probability density function can
Estimation
Binary choice
• capture and demonstrate stylized facts
Density estimation (e.g., development of income distribution)
Introduction
⊲ Motivation • describe an unknown distribution
Histogram
Local histogram
(e.g., of an estimation procedure in finite samples)
Kernel estimator
Related methods
• help in parametric inference
Kernel and band. (e.g., asymptotic variance of LAD depends on f (0))
Bias and variance
Bandwidth choice
Plug-in methods
Asymptotics • conditional moment
´ estimation:
Confidence bands E(Y |X = x) = yf (x, y)/f (x)dy in regression)
Testing
Multivariate density • conditional distribution function estimation:
Regression est. P (Y ≤ t|X = x) = E[I(Y ≤ t)|X = x)
Semiparametrics • derivative of density function
Discrete choice
by differentiating a density estimator
Censored data
Final thoughts
Motivation
Introduction
Parametric approach
Estimation
Binary choice
• assume form parametrized by a number of parameters
Density estimation ( 2 )
Introduction 1 1 x−µ
⊲ Motivation f (x|µ, σ) = √ exp −
Histogram
2πσ 2 σ
Local histogram
Kernel estimator
Related methods
Kernel and band. • estimate µ and σ
• set fˆ(x) = f (x|µ̂, σ̂)
Bias and variance
Bandwidth choice
Plug-in methods
Asymptotics
Confidence bands Nonparametric approach
Testing
• do not assume specific form or parameters
Regression est.
• impose smoothness of the density function
Semiparametrics
Discrete choice
• estimate a general density function
Censored data • example: histogram
Final thoughts
Net income example
Introduction
Net income in the U.K. from 1969 to 1983:
Estimation
noparametric and parametric density estimates
Binary choice
Density estimation
Introduction
⊲ Motivation
Histogram
Kernel density Log-normal density
Local histogram
Kernel estimator
Related methods
Kernel and band.
Bias and variance
Bandwidth choice
Plug-in methods
Asymptotics
Confidence bands
Testing
Multivariate density 1981
1979 1981
1977 1979
Regression est. 1975 1977
1973 1975
1971 1973
Semiparametrics
1971
Discrete choice
Censored data
Final thoughts
Histogram
Introduction
Estimate density f (observations x1 , . . . , xn ∼ F )
Estimation
Binary choice • select origin x0 and bin width h

Density estimation • construct intervals (bins): Ij = hx0 + (j − 1)h, x0 + jh)
Introduction Pn
Motivation • set fj = i=1 I(xi
∈ Ij )/(nh) for each interval
⊲ Histogram
Local histogram • example: observations from χ2 (5) with x0 = 0 and h = 1
Kernel estimator
Related methods
Kernel and band. 15
Bias and variance
Bandwidth choice
Plug-in methods
Asymptotics
10
Confidence bands
Y*E-2
Testing
Regression est.
5
Semiparametrics
Discrete choice
Censored data
0
Final thoughts
0 5 10 15
X
Histogram – properties
Introduction
Mathematical explanation
Estimation
Binary choice
• probability of “falling” into interval Ij = [x0 + (j − 1)h, x0 + jh]
Density estimation
ˆ
Introduction P (X ∈ Ij ) = f (x)dx ≈ f {x0 + (j − 1/2)h}h
Motivation
⊲ Histogram
Ij
Local histogram
Kernel estimator
Related methods
Kernel and band.
• estimate of the density
Bias and variance
Bandwidth choice X n
1 1
Plug-in methods
Asymptotics
fˆ{x0 + (j − 1/2)h} ≈ P (X ∈ Ij ) ≈ I(xi ∈ Ij )
h hn
i=1
Confidence bands
Testing
Multivariate density Properties
Regression est.
• step function
Semiparametrics
Discrete choice
• bias ∼ h: fˆ{x0 + (j − 1/2)h} used for all x ∈ Ij
Censored data • variance ∼ 1/nh: all data in Ij used
Final thoughts • dependence on origin and on bin width h
Histogram: simulated example
Introduction
100 histograms for 500 observations simulated from N (0, 1)
Estimation
Binary choice
Density estimation h=0.1 h=0.5

Introduction
0.6
0.6
Motivation
⊲ Histogram
0.5
0.5
Local histogram
0.4
0.4
Kernel estimator
Y
Y
0.3
0.3
Related methods
0.2
0.2
Kernel and band.
Bias and variance
0.1
0.1
Bandwidth choice
0
0
Plug-in methods -5 0 5 -5 0 5
X X
Asymptotics
Confidence intervals h=1.0 h=2.0
Confidence bands
Testing
0.6
0.6
0.5
0.5
Regression est.
0.4
0.4
Y
Y
0.3
0.3
Semiparametrics
0.2
0.2
Discrete choice
0.1
0.1
Censored data
0
-5 0 5 -5 0 5
Final thoughts X X
Local histogram
Introduction
• use interval around any given x: (x − h/2, x + h/2)
Estimation
• estimate
Binary choice
Xn
1
Density estimation
fˆh (x) = I(x − h/2 ≤ xi ≤ x + h/2)
Introduction
nh
Motivation i=1
Histogram
Xn
⊲ Local histogram 1 1 xi − x 1
Kernel estimator = I − ≤ ≤
Related methods nh 2 h 2
Kernel and band. i=1
Bias and variance
Bandwidth choice Properties (strictly speaking, we should write hn and fˆhn ,n ):
Plug-in methods
fˆh (x)dx = 1 (verify)

Asymptotics
´
Confidence intervals •
Confidence bands
Testing • not continuous, but converges to f (x) for h → 0
Regression est.
′ F (x + h/2) − F (x − h/2)
f (x) = F (x) = lim
Semiparametrics
h→0 h
Discrete choice
P (x − h/2 ≤ Xi ≤ x + h/2)
Censored data = lim
h→0 h
Final thoughts
Kernel estimator
Introduction
(Local) histogram not smooth ⇒ replace indicator by a smooth
Estimation
function
Binary choice
Density estimation • kernel function K(x)

Introduction
Motivation • positive K(x) ≥ 0
Histogram
´∞
Local histogram • −∞ K(t)dt =1
⊲ Kernel estimator
Related methods Rozenblatt-Parzen estimator (for bandwidth h)
Kernel and band.
Bias and variance
n
X
Bandwidth choice
ˆ 1 xi − x
Plug-in methods fh (x) = K
Asymptotics nh h
Confidence intervals i=1
Confidence bands
Testing
Multivariate density Further requirements on kernel K(x)
Regression est.
• diminishes with distance from zero: K(−∞) = K(∞) = 0
Semiparametrics ´
Discrete choice
• is symmetric tK(t)dt = 0
Censored data • typically, K(x) > 0 for x ∈ (−1, 1)
Final thoughts
Related methods
Introduction
• Derivative estimation:
ˆ(s) Pn
Estimation s s+1
fh (x) = (−1) /(nh ) i=1 K (s) {(xi − x)/h}
Binary choice
Density estimation
• Variable bandwidth: each point xi has it own bandwidth hin
Introduction
Motivation
• K th nearest neighbor:
Histogram
n
X
Local histogram
1 xi − x
Kernel estimator
fˆk (x) = K ,
⊲ Related methods
ndk (x) dk (x)
Kernel and band. i=1
Bias and variance
Bandwidth choice
Plug-in methods where dk (x) = distance of x and its k th nearest neighbor
Asymptotics
Confidence intervals • Series estimation: express continuous density as
Confidence bands
Testing
J
X
Regression est. fˆJ (x) = aj gj (x)
Semiparametrics j=1
Discrete choice
Censored data
for some orthogonal functions gj (x)
Final thoughts (e.g., gj (x) = Hj (x) or φ(x)xj )
Kernel functions
Introduction
Examples of various kernel functions
Estimation
Binary choice
Density estimation Kernel Function

Introduction
Motivation Uniform 1/2 · I(|x| ≤ 1)
Histogram
Local histogram Triangular (1 − |x|)I(|x| ≤ 1)
Kernel estimator
Related methods
Quartic 15/16 · (1 − x2 )2 I(|x| ≤ 1)
⊲ Kernel and band.
Bias and variance
Epanechnikov 3/4√· (1 − x2 )I(|x| ≤ 1)
Bandwidth choice Gaussian 1/ 2π · exp(−x2 /2)
Plug-in methods
Asymptotics ... ...
Confidence bands
Testing
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Kernel functions – graphs
Introduction
Plots of several kernel functions with support [−1, 1]
Estimation
Binary choice
Density estimation
Introduction
Motivation Uniform Epanechnikov
Histogram
1
1
Local histogram
Kernel estimator
K(x)
K(x)
Related methods
0.5
0.5
Bias and variance
Bandwidth choice
0
0
Plug-in methods
-1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5
Asymptotics x x
Triangle Quartic
Confidence bands
Testing
1
1
Regression est.
K(x)
K(x)
0.5
0.5
Semiparametrics
Discrete choice
0
Censored data -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5
x x
Final thoughts
Kernel choice
Introduction
Estimated density of stock returns with different kernels
Estimation
(Pagan and Schwert, 1990; monthly US data 1834–1925)
Binary choice
Density estimation
Introduction
Motivation
Histogram Uniform kernel, h=0.015 Epanechnikov kernel, h=0.015
Local histogram
Kernel estimator
10
10
Related methods
fh
fh
Bias and variance
5
5
Bandwidth choice
Plug-in methods
0
0
Asymptotics
Confidence intervals -0.15 -0.1 -0.05 0 0.05 0.1 0.15 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
x x
Confidence bands
Testing Triangle kernel, h=0.015 Quartic kernel, h=0.015
Regression est.
10
10
Semiparametrics
fh
fh
5
5
Discrete choice
Censored data
0
Final thoughts -0.15 -0.1 -0.05 0 0.05 0.1 0.15 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
x x
Bandwidth choice
Introduction
Estimated density of stock returns with different bandwidths
Estimation
(Pagan and Schwert, 1990; monthly US data 1834–1925)
Binary choice
Density estimation
Introduction
Motivation
Histogram Epanechnikov kernel, h=0.005 Epanechnikov kernel, h=0.01
Local histogram
Kernel estimator
15
15
Related methods
10
10
fh
fh
Bias and variance
Bandwidth choice
5
5
Plug-in methods
Asymptotics
0
0
Confidence intervals -0.15 -0.1 -0.05 0 0.05 0.1 0.15 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
x x
Confidence bands
Testing Epanechnikov kernel, h=0.025 Epanechnikov kernel, h=0.025
15
15
Regression est.
10
10
Semiparametrics
fh
fh
Discrete choice
5
Censored data
0
Final thoughts -0.15 -0.1 -0.05 0 0.05 0.1 0.15 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
x x
Bandwidth choice – simulations
Introduction
100 density estimates for 500 observations simulated from N (0, 1)
Estimation
Binary choice
Density estimation h=0.05 h=0.20

Introduction
0.6
0.6
Motivation
Histogram
0.5
0.5
Local histogram
0.4
0.4
Kernel estimator
Y
Y
0.3
0.3
Related methods
0.2
0.2
Bias and variance
0.1
0.1
Bandwidth choice
0
0
Plug-in methods -5 0 5 -5 0 5
X X
Asymptotics
Confidence intervals h=0.80 h=3.20
Confidence bands
Testing
0.6
0.6
0.5
0.5
Regression est.
0.4
0.4
Y
Y
0.3
0.3
Semiparametrics
0.2
0.2
Discrete choice
0.1
0.1
Censored data
0
-5 0 5 -5 0 5
Final thoughts X X
Density estimation – assumptions
Introduction
Kernel estimator of density f
Estimation
n
X n
X
Binary choice
1 x i − x 1
Density estimation fˆh (x) = K = wni (x)
Introduction nh h nh
Motivation
i=1 i=1
Histogram
Local histogram • observations x1 , . . . , xn
Kernel estimator
Related methods • kernel K is symmetric around zero and
Kernel and band.
⊲ Bias and variance
´
Bandwidth choice
◦ K(t)dt = 1
Plug-in methods
´ 2
Asymptotics
◦ t K(t)dt = µ2 6= 0
´ 2
Confidence bands
◦ K (t)dt = kKk2 < ∞
Testing
• h = hn → 0 as n → ∞
Regression est.
Semiparametrics
• nhn → ∞ as n → ∞
Discrete choice
Censored data • f is twice continuously differentiable

Final thoughts
Exact bias and variance
Introduction
• Bias (substitution t = (u − x)/h)
Estimation

Binary choice
1 xi − x
Density estimation E[fˆh (x) − f (x)] = E K − f (x)
Introduction h h
Motivation

1 u−x
ˆ
Histogram
Local histogram
= K f (u)du − f (x)
Kernel estimator
h h
Related methods
ˆ
Kernel and band.
= K(t){f (x + th) − f (x)}dt
Bandwidth choice
Plug-in methods
Asymptotics
• Variance (varZ = E(Z 2 ) − [E(Z)]2 )
Confidence bands

Testing
ˆ 1 1 xi − x
var[fh (x)] = var K
Regression est. n h h
ˆ 2
1 1
Semiparametrics
ˆ
2
Discrete choice = K (t)f (x + th)dt − K(t)f (x + th)dt
nh n
Censored data
Final thoughts
Asymptotic bias and variance
Introduction
Using the Taylor expansion
Estimation
1
Binary choice f (x + th) = f (x) + thf (x) + (th)2 f ′′ (x) + · · ·
′
Density estimation 2
Introduction
Motivation • bias up to O(h2 )
Histogram
Local histogram
ˆ
Kernel estimator
Related methods
E[fˆh (x) − f (x)] = K(t){f (x + th) − f (x)}dt
Kernel and band.
1
ˆ
Bandwidth choice = K(t){thf (x) + (th)2 f ′′ (x)}dt
′
Plug-in methods 2
Asymptotics ˆ 2 ′′
h f (x)
ˆ
′
K(t)t2 dt
Confidence bands = hf (x) K(t)tdt +
Testing 2
Regression est. h 2 ˆ
h 2
bias[fˆh (x)] = ′′ 2
f (x) t K(t)dt = f ′′ (x)µ2 (K)
Semiparametrics
2 2
Discrete choice
Censored data
Final thoughts
Example – bias vs. density
Introduction
Density = mixture of N (0, 1) (0.3) and t1 − 3 (0.7)
Estimation
Binary choice
Density estimation
Introduction
Density (dashed) and bias effect (solid)
Motivation
Histogram
Local histogram 20
Kernel estimator
Related methods
Kernel and band.
15
Density and bias*E-2
Bandwidth choice
Plug-in methods
Asymptotics
10
Confidence bands
Testing
Regression est.
5
Semiparametrics
Discrete choice
Censored data
Final thoughts -6 -4 -2 0 2
x
Asymptotic bias and variance
Introduction
Using the Taylor expansion
Estimation
1
Binary choice f (x + th) = f (x) + thf (x) + (th)2 f ′′ (x) + · · ·
′
Introduction
Motivation • bias is up to O(h2 )
Histogram
Local histogram
Kernel estimator
h2 h2
ˆ
Related methods
Kernel and band. bias[fˆh (x)] = ′′
f (x) 2
t K(t)dt = µ2 (K)f ′′ (x)
⊲ Bias and variance 2 2
Bandwidth choice
Plug-in methods
Asymptotics • variance is up to O(1/nh)
1 1
Confidence bands
ˆ
Testing var[fˆh (x)] = f (x) K 2 (t)dt = kKk2 f (x)
Multivariate density nh nh
Regression est.
Semiparametrics • mean square error (MSE = Bias2 + Var)

Discrete choice 2
h2 1
Censored data
M SE[fˆh (x)] = µ2 f (x) ′′
+ kKk2 f (x)
Final thoughts 2 nh
Example – bias, variance, and MSE vs. bandwidth
Introduction
Density = mixture of N (0, 1) (0.3) and N (−3, 1) (0.7)
Estimation
Binary choice
Squared bias (solid), variance (dashed), and MSE (thick)
Density estimation
Introduction
Motivation
15
Histogram
Local histogram
Kernel estimator
Related methods
Bias, variance and MSE*E-4
Kernel and band.

10
Bandwidth choice
Plug-in methods
Asymptotics
Confidence bands
Testing
5
Regression est.
Semiparametrics
Discrete choice
0
Censored data
2 4 6 8 10
Final thoughts Bandwidth*E-2
Bandwidth choice
Introduction
Bias-variance trade-off (see simulation)
Estimation
Binary choice
• MSE pointwise only
M SE[fˆh (x)] = E[fˆh (x) − f (x)]2

Density estimation
Introduction
Motivation
Histogram • Mean integrated squared error (MISE)
Local histogram
Kernel estimator
ˆ ˆ
Related methods
Kernel and band.
M ISE[fˆh ] = E {fˆh (x) − f (x)}2 dx = M SE[fˆh (x)]dx
Bias and variance
⊲ Bandwidth choice
Plug-in methods • Asymptotic MISE (AMISE)
Asymptotics
ˆ "2
2 #
h 1
Confidence bands
AM ISE[fˆh ] = µ2 (K)f ′′ (x) + kKk2 f (x) dx
Testing
2 nh
Regression est. 4
h 2
ˆ
1
ˆ
Semiparametrics = µ2 (K) [f ′′ (x)]2 dx + kKk2 f (x)dx
4 nh
Discrete choice
h4 1
Censored data
= µ2 (K)kf ′′ k2 + kKk2
Final thoughts 4 nh
Bandwidth choice
Introduction
Optimal bandwidth h = minimal error
Estimation
Binary choice
hopt = arg minh AM ISE(fˆh )
Density estimation
Introduction
Motivation
Histogram • optimal bandwidth
Local histogram
1/5
Kernel estimator
kKk2
Related methods
hopt = ∼ n−1/5
Kernel and band.
Bias and variance
kf ′′ k2 µ22 (K)n
⊲ Bandwidth choice
Plug-in methods
Asymptotics • optimal error AM ISE ∼ n−4/5 (histogram: n−2/3 )
Confidence bands • kf ′′ k2 unknown
Testing
Kernel choice by minimizing AMISE
Regression est.
Semiparametrics • K opt (x) = 34 (1 − x2 ) (Epanechnikov)

Discrete choice • other kernels have just slightly worse efficiencies
Censored data
Gauss (1.04), quartic (1.005), triangle (1.01), unif. (1.06)
Final thoughts
Plug-in methods
Introduction
Plug-in methods = assume normality
Estimation
Binary choice
• Silverman’s rule of thumb: f = φ{(x − µ)/σ}/σ
Density estimation ⇒ hROT = 1.06σ̂n−1/5 for Gaussian kernel
Introduction
Motivation • Park and Marron plug-in estimator:
Histogram
Local histogram
Kernel estimator
◦ estimate f ′′ (x) by kernel density estimation
Related methods
n
X
Kernel and band.
ˆ′′ 1 ′′ xi − x
Bias and variance fhROT (x) = K ,
Bandwidth choice nh3ROT i=1 hROT
⊲ Plug-in methods
Asymptotics
Confidence intervals ◦ use bias correction
Confidence bands
\ ′′ k2 = kfˆ′′ k2 −
1
kK ′′ k2
Testing
kf
nh5ROT
Regression est.
Semiparametrics ◦ compute optimal bandwidth

Discrete choice !1/5
kKk2
Censored data
hP M = ∼ n−1/5
Final thoughts \
kf ′′ k2 µ2 (K)n
2
Example: Car weights
Introduction
Beware: Stata example for the car weight data
Estimation
kdensity weight, kernel(epanechnikov) generate(x epan)
Binary choice
kdensity weight, kernel(parzen) generate(x2 parzen)
Density estimation
Introduction line epan parzen x, sort ytitle(Density) legend(cols(1))
Motivation
Histogram
.0008
Local histogram
Kernel estimator
Related methods
Kernel and band. .0006
Bias and variance
Bandwidth choice
Density
.0004
⊲ Plug-in methods
Asymptotics
.0002
Confidence bands
Testing
0
Regression est.
1000 2000 3000 4000 5000
Weight (lbs.)
Semiparametrics
Epanechnikov density estimate
Discrete choice Parzen density estimate
Censored data
Final thoughts
Introduction
Beware: Stata example for the car weight data
Estimation
kdens weight, kernel(epanechnikov) generate(epan x) bw(sjpi)
Binary choice
kdens weight, kernel(parzen) generate(parzen x) bw(sjpi)
Density estimation
Introduction line epan parzen x, sort ytitle(Density) legend(cols(1))
Motivation
Histogram
.0005
Local histogram
Kernel estimator
.0004
Related methods
Kernel and band.
Bias and variance
.0002 .0003
Bandwidth choice
Density
⊲ Plug-in methods
Asymptotics
Confidence bands
.0001
Testing
0
Regression est.
1,000 2,000 3,000 4,000 5,000
Weight (lbs.)
Semiparametrics
Epanechnikov density estimate
Discrete choice Parzen density estimate
Censored data
Final thoughts
Introduction
kdensity weight, nograph generate(x fx)
Estimation
kdensity weight if foreign==0, nograph generate(fx0) at(x)
Binary choice
kdensity weight if foreign==1, nograph generate(fx1) at(x)
Density estimation
Introduction line fx0 fx1 x, sort ytitle(Density)
Motivation
Histogram
.001
Local histogram
Kernel estimator
Related methods
Kernel and band. .0008
Bias and variance
.0006
Bandwidth choice
Density
⊲ Plug-in methods
Asymptotics
.0004
Confidence bands
.0002
Testing
Regression est.
0
Semiparametrics 1000 2000 3000 4000 5000

Weight (lbs.)
Discrete choice Domestic cars Foreign cars
Censored data
Final thoughts
Asymptotics – assumptions
Introduction
Provided that kernel K and density f satisfy additionally
Estimation
Binary choice • |x|K(x) → 0 as x → ∞

Density estimation • supx∈R |K(x)| < ∞
Introduction
K 2+δ (x)dx < ∞
´
Motivation •
Histogram
Local histogram
Kernel estimator
Related methods • f is everywhere continuous
Kernel and band. ´
Bias and variance • |f (x)|dx < ∞
Bandwidth choice
Plug-in methods
⊲ Asymptotics
Confidence intervals the kernel density estimator is asymptotically unbiased
Confidence bands
Testing
Multivariate density lim Efˆh = f and lim sup |Efˆh − f | = 0
n→∞ n→∞ x∈R
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Asymptotics – consistency and normality
Introduction
Kernel density estimator is (hn → 0 and nhn → ∞)
Estimation
P
Binary choice • pointwise consistent: M SE[fˆh ] → 0 and fˆh → f
Density estimation
Introduction
• uniformly consistent under some regularity conditions and
P
Motivation
Histogram
nh2n → ∞: supx∈R |fˆh (x) − f (x)| → 0
Local histogram
Kernel estimator
• asymptotically normal (pointwise):
Related methods √
Kernel and band.
Bias and variance
nh{fˆh (x) − Efˆh (x)} → N 0, f (x)kKk2
Bandwidth choice
Plug-in methods √ √
⊲ Asymptotics
ˆ
(the same applies to nh(fh − f ) if nhh2 → 0,
which does not hold for hopt , but for undersmoothing)
Confidence bands
• asymptotics of fˆh − f under h = cn−1/5

Testing
Regression est.
√ c2
Semiparametrics
nh{fˆh (x) − f (x)} → N f ′′ (x)µ2 (K), f (x)kKk2
Discrete choice 2
Censored data √
Final thoughts • optimal rate of convergence: hopt ∼ n−1/5 ⇒ nh ∼ n2/5
Introduction
Asymptotic normality ⇒ pointwise confidence intervals
Estimation
Binary choice
• asymptotic confidence interval
Density estimation
α
−1/2
Φ−1 1−
Introduction
ˆ
Motivation fˆh (x) ± √ 2
fˆh (x) K 2 (x)dx
Histogram
Local histogram
nh
Kernel estimator
Related methods
under undersmoothing (h ∼ n−1/5−δ , δ > 0)
Kernel and band.
Bias and variance
Bandwidth choice
• finite samples
Plug-in methods
Asymptotics ◦ undersmooth (see above)
⊲ Confidence intervals
Confidence bands ◦ estimate bias (difficult in small samples)
Testing
α
−1
h2 Φ−1 1−
ˆ
Regression est.
fˆh (x)− fc′′
h (x)µ2 (K)± √ 2
fˆh (x) K 2 (x)dx
Semiparametrics 2 nh
Discrete choice
Censored data ◦ bootstrap variance (improvement in finite samples)

Final thoughts
Confidence bands
Introduction
• confidence intervals – pointwise
Estimation
• confidence bands – interval or R wide
Binary choice
Density estimation
Introduction
◦ available under restrictive assumptions
Motivation (undersmoothing, f on interval (0, 1))
Histogram
Local histogram
" #1/2 1/2
Kernel estimator
ˆ(x)kKk2
f z
fˆ(x)±
Related methods
Kernel and band.
1/2
+ dn
Bias and variance nh {2(1/5 + δ) log(n)}
Bandwidth choice
Plug-in methods
Asymptotics with coverage probability 1 − α = exp[−2 exp(z)] and
⊲ Confidence bands dn = {2(1/5 + δ) log(n)}1/2 [1 + log{kK ′ k2 /2πkKk2 }]
Testing
Regression est.
• confidence bands typically wider than confidence intervals
Semiparametrics
Discrete choice
Censored data
Final thoughts
Example: CPS 1985
Introduction
Income distribution in the USA (CPS 1985)
Estimation
Test statistics: T = 57.209 > 2.32 = Φ−1 (0.99)
Binary choice
Density estimation Log-normal (solid) and nonparametric (dashed) densities

Introduction
Motivation
Histogram
Local histogram 10
Kernel estimator
Related methods
Kernel and band.
Bias and variance
Bandwidth choice
Density*E-2
Plug-in methods
Asymptotics
5
⊲ Confidence bands
Testing
Regression est.
Semiparametrics
Discrete choice
0
Censored data
Final thoughts 0 10 20 30
Income
Example: CPS 1985
Introduction
Estimation
kdens wagelog, ci normal bw(oversmooth)
Binary choice
Density estimation
.25
Introduction
Motivation
Histogram
.2
Local histogram
Kernel estimator
Related methods .15
Density
Kernel and band.

Bias and variance
.1
Bandwidth choice
Plug-in methods
Asymptotics
.05
⊲ Confidence bands
Testing
0
−2 0 2 4 6 8
Regression est. wlog
Semiparametrics 95% CI Kernel estimate

Normal density
Discrete choice
Censored data
Final thoughts
Testing
Introduction
Testing H0 : f = g vs. H1 : f 6= g for a known g(x, θ)
Estimation
Binary choice • test statistics

Density estimation
ˆ
Introduction ˆ fˆ, g) =
Iˆ = I( {fˆ(x) − g(x, θ)}2 dx
Motivation
Histogram
x
Local histogram
Kernel estimator • asymptotically nh1/2 (Iˆ − c(n) − Ibias(fˆ)2 )´→ N (0, σ 2 ),
Related methods
Kernel and band. where c(n) = kKk2 /(nh) and σ 2 = C(K) f 2 (x)dx
Bias and variance
Bandwidth choice • bias eliminated by undersmoothing (h ∼ n−1/5−δ , δ > 0)
Plug-in methods Pn ˆ
Asymptotics • variance estimated by σ̂ = C(K) i=1 f (xi )/n or
Confidence bands
n X
X n
C(K) xi − xj
⊲ Testing
ˆ
σ̂ = C(K) fˆ2 (x)dx = K ∗K
Regression est. n2 h h
i=1 j=1
Semiparametrics
• test statistics T = nh1/2 {Iˆ − kKk2 /(nh)}/σ̂ → N (0, 1)

Discrete choice
Censored data
Final thoughts
Example: CPS 1985
Introduction
Estimation
Test statistics: T = 57.209 > 2.32 = Φ−1 (0.99)
Binary choice
Density estimation Log-normal (solid) and nonparametric (dashed) densities

Introduction
Motivation
Histogram
Local histogram 10
Kernel estimator
Related methods
Kernel and band.
Bias and variance
Bandwidth choice
Density*E-2
Plug-in methods
Asymptotics
5
Confidence bands
⊲ Testing
Regression est.
Semiparametrics
Discrete choice
0
Censored data
Final thoughts 0 10 20 30
Income
Multivariate density estimation
Data xi ∈ Rd and kernel K : Rd → R

Introduction
Estimation
Binary choice • product kernel K(x) = K1 (x1 ) · . . . · Kd (xd )

´
Density estimation
Introduction
• radially symmetric kernel K(x) = K1 (kxk)/ Rd K1 (ktk)dt
Motivation
Histogram Multivariate density estimation
Local histogram
ˆ Pn
Kernel estimator • single bandwidth: fh (x) = i=1 K{(x − xi )/h}/nhd
Related methods
Kernel and band. • multiple bandwidths:
Bias and variance
Bandwidth choice
Plug-in methods
n
X
Asymptotics 1 x1 − xi1 xd − xid
Confidence intervals fˆh (x) = K ,··· ,
Confidence bands nh1 · . . . · hd h1 hd
Testing i=1
⊲ Multivariate density
Regression est.
• most general (H ∈ Rd×d )
Semiparametrics
X n
Discrete choice
ˆ 1
fH (x) = K{H −1 (x − xi )}
Censored data ndet(H)
i=1
Final thoughts
Multivariate density estimation – properties
Introduction
For a symmetric kernel with second moments and norm
´ ´
Estimation
( K(x)dx = 1, xK(x)dx = 0,
2
Binary choice
´ ⊤
´ 2
µ2 (K) = xx K(x)dx, kKk = K (x)dx)
Density estimation
Introduction
Motivation
• bias (H(f ) = Hessian matrix of f ) [result for H = hId ]
Histogram 2

1 h
bias[fˆH (x)] ≈ µ2 (K)tr(H ⊤ H(f )H) =
Local histogram
Kernel estimator µ2 (K)tr(H(f ))
Related methods 2 2
Kernel and band.
Bias and variance
Bandwidth choice • variance [result for H = hId ]
Plug-in methods
Asymptotics
1 1
Confidence intervals var[fˆH (x)] ≈ kKk2 f (x) = d
kKk 2
f (x)
Confidence bands ndet(H) nh
Testing
Regression est. • AMISE = Bias2 + Variance

Semiparametrics
• optimal bandwidth h ∼ n−1/(4+d) (ROT, CV, ...)
Discrete choice √
• optimal AM ISE and rate of convergence ∼ n−2/(4+d)
Censored data
Final thoughts • curse of dimensionality

Age-income example
Introduction
Joint density of income and age in east Germany, 1991
Estimation
Binary choice
Density estimation
Introduction
Motivation Age-income density estimate
Histogram
Local histogram
Kernel estimator
Related methods
Kernel and band.
Bias and variance
Bandwidth choice
Plug-in methods
Asymptotics
Confidence bands
Testing
Regression est.
Semiparametrics 3426
2917
2408
Discrete choice
1898
1389
Censored data 880
Final thoughts 25 32 39 46 52 59
Introduction
Estimation
Binary choice
Density estimation
⊲ Regression est.
Cond. moments
Nonpar. regression
Various estimators
Local linear reg. Nonparametric regression
Local polynomial
Example
Assumptions
estimation
Nadaraya-Watson
Local linear reg.
Bandwidth choice
Cross validation
Asymptotics
Testing
Examples
Multivariate reg.
Semiparametrics
Discrete choice
Censored data
Final thoughts
Conditional moments
Introduction
Estimation of conditinal moments
Estimation
Binary choice
• regression
Density estimation
◦ dependent variable Y (e.g., earnings)
Regression est.
⊲ Cond. moments ◦ explanatory variables X (e.g., age, education)
Nonpar. regression
Various estimators
Local linear reg.
Local polynomial
yi = m(xi ) + εi
Example
Assumptions
E(Y |X) = m(X)
Nadaraya-Watson
Local linear reg.
ln Earnings = m(Age, Education) + ε
Bandwidth choice
Cross validation
Asymptotics • conditional variance
Testing
Examples E[{Yi − E(Yi |Xi )}2 |Xi ] = E(Yi2 |Xi ) − [E(Yi |Xi )]2
Multivariate reg.
Semiparametrics
Discrete choice
• conditional probability
Censored data
Final thoughts
P (Yi = 1|Xi ) = 1P (Yi = 1|Xi )+0P (Yi = 0|Xi ) = E(Yi |Xi )
Univariate regression
Introduction
Estimation idea with explanatory variable
Estimation
Binary choice
• discrete (bandwidth 1 ≫ h)
Density estimation
n
X I(xi = x)yi
Regression est.
Ên (Yi |Xi = x) = Pn
j=1 I(xj = x)
Cond. moments
⊲ Nonpar. regression i=1
Various estimators
Xn
Local linear reg. I(x − h < xi < x + h)yi
Local polynomial = Pn
j=1 I(x − h < xj < x − h)
Example
Assumptions i=1
Nadaraya-Watson
Xn
Local linear reg. I[|x − xi |/h < 1]yi
Bandwidth choice = Pn
Cross validation
i=1 j=1 I[|x − xj |/h < 1]
Asymptotics
Testing
Examples
• continuous (kernel K , bandwidth h)
Multivariate reg.
n
X
Semiparametrics K{(x − xi )/h}yi
Ên (yi |x) = Pn
j=1 K{(x − xj )/h}
Discrete choice
Censored data
i=1
Final thoughts
Nonparametric regression
Introduction
Regression model y = E(y|x) + ε = m(x) + ε
Estimation
f (y, x)
Binary choice
ˆ ˆ
Density estimation m(x) = E(y|x) = yf (y|x)dy = y dy
fx (x)
Regression est.
Cond. moments
⊲ Nonpar. regression • joint density f (y, x), x ∈ Rp
Various estimators
n
X
Local linear reg. 1 yi − y xi − x
Local polynomial fˆ(y, x) = Ky Kx
Example nh′ hp h′ h
Assumptions i=1
Nadaraya-Watson
Local linear reg. • marginal density fx (x)
Bandwidth choice n
X
Cross validation
ˆ 1 xi − x
Asymptotics fx (x) = Kx
Confidence intervals nhp h
Testing
i=1
• substitute fˆ and fˆx to estimate m(x)

Examples
Multivariate reg.
Pn yi −y
Semiparametrics
ˆ 1 xi −x
f (y, x) i=1 Ky Kx h
ˆ ˆ
Discrete choice
dy = y h h′
′
m̂(x) = y Pn xi −x
dy
Censored data
fˆx (x) i=1 Kx h
Final thoughts
Nonparametric regression
Introduction
Smooth regression function m(x)
Estimation
Binary choice • integrate using substitution t = (yi − y)/h′

Density estimation
1 Pn xi −x
´ yi −y
Regression est. h′ i=1 Kx h yKy h′ dy
Cond. moments m̂(x) = Pn xi −x

⊲ Nonpar. regression i=1 Kx h
Various estimators Pn xi −x

(yi + th′ )Ky
´
Local linear reg.
i=1 Kx h (t) dt
Local polynomial = Pn xi −x

i=1 Kx
Example
Assumptions
h
Nadaraya-Watson
Local linear reg. • Nadaraya-Watson estimator
Bandwidth choice
Cross validation
n
Asymptotics X K x−xi
yi
r̂h (x, y)
h =
m̂h (x) = Pn
Testing
K
x−xj fˆh (x)
Examples i=1 j=1 h
Multivariate reg.
Semiparametrics Pn
Discrete choice
• general nonparametric estimator m̂h (x) = i=1 wni (x)yi
Censored data
with weights wni (x) = wn (xi , x)
Final thoughts
Example – Engel curve
Introduction
Food expenditures vs. net income in the U.K., 1973
Estimation
Binary choice
Density estimation Engel Curve (UK, 1973)

Regression est.
1.5
Cond. moments
⊲ Nonpar. regression
Various estimators
Local linear reg.
Local polynomial
Example
1
Assumptions
Food expenditures
Nadaraya-Watson
Local linear reg.
Bandwidth choice
Cross validation
Asymptotics
0.5
Testing
Examples
Multivariate reg.
Semiparametrics
Discrete choice
0
Censored data 0 0.5 1 1.5 2 2.5 3

Final thoughts Net income
Example – coronary heart disease data
Introduction
probit chd age
Estimation
predict prob, pr
Binary choice
lpoly chd age, addplot((line prob age, sort))
Density estimation
Regression est.

Cond. moments Local polynomial smooth
⊲ Nonpar. regression
1
Various estimators
Local linear reg. .8
Local polynomial
Example
.6
Assumptions
Nadaraya-Watson
Local linear reg.
.4
Bandwidth choice
Cross validation
.2
Asymptotics
0
Testing
Examples 0 20 40 60
Multivariate reg. Age (in years)
Semiparametrics Evidence of coronary heart disease (1=yes, 0=no)

Pr(chd) lpoly smooth
Discrete choice
kernel = epanechnikov, degree = 0, bandwidth = 4.49
Censored data
Final thoughts
Various estimators
Introduction
Pn
General nonparametric estimator m̂h (x) = i=1 wni (x)yi
Estimation
• Nadaraya-Watson:
Binary choice
Pn
Density estimation wni (x) = K{(xi − x)/h}/ j=1 K{(xj − x)/h}
Regression est.
• Variable bandwidth estimation:
Cond. moments Pn
Nonpar. regression wni (x) = K{(xi − x)/hi }/ j=1 K{(xj − x)/hj }
⊲ Various estimators
Local linear reg.
Local polynomial
Example • The k th nearest neighbor estimator:
Assumptions
Nadaraya-Watson
wni (x) = Iki /k = I(xi = kth nearest to x)/k or
Local linear reg.
Bandwidth choice
wni (x) = Iki wk
Cross validation
Asymptotics
Confidence intervals • Known density fx (x):
Testing
Examples wni (x) = K{(xi − x)/h}/[(nhp )f1 (x)]
Multivariate reg.
• Fixed design
´ (Gasser-Müller estimator):
Semiparametrics si
Discrete choice
wni (x) = si−1 K{(t − x)/h}/h ≈ (si − si−1 )K[(x − ξ)/h]
Censored data
Final thoughts
Local linear regression
Introduction
• Nadaraya-Watson minimizes (b0 (x) = m(x), verify)
Estimation
n
X
Binary choice
Density estimation
{yi − b0 (x)}2 K{(xi − x)/h}
Regression est.
i=1
Cond. moments
Nonpar. regression • Local linear regression – minimize
Various estimators
⊲ Local linear reg. n
X
Local polynomial
Example
{yi − b0 (x) − b1 (x)(xi − x)}2 K{(xi − x)/h}
Assumptions i=1
Nadaraya-Watson
Local linear reg. ◦ at given x, b0 (x), b1 (x) regression constants
Bandwidth choice
Cross validation ◦ weighted least squares regression (around x):
Asymptotics
Testing b̂0,h (x) = ȳ − b̂1 (x)(x̄h − x)
Examples Pn
Multivariate reg.
i=1 (yi − ȳh )(xi − x̄h )K{(xi − x)/h}
b̂1,h (x) = Pn 2 K{(x − x)/h}
Semiparametrics
j=1 (x j − x̄ h ) j
Discrete choice
Pn Pn
Censored data
for v̄h = i=1 vi K{(xi − x)/h}/ j=1 K{(xj − x)/h}
Final thoughts
Local polynomial regression
Introduction
• Motivation – Taylor expansion
Estimation
Binary choice ∂m 1 ∂ pm p
m(xi ) ≈ m(x) + (x)(xi − x) + . . . + (x)(x i − x)
Density estimation ∂x p! ∂xp
Regression est.
Cond. moments
Nonpar. regression
• Local polynomial regression – minimize
Various estimators
Local linear reg. n
X
xi − x
{yi −b0 (x)−b1 (x)(xi −x)−. . .−bp (x)(xi −x)p }2 K
⊲ Local polynomial
Example
Assumptions h
i=1
Nadaraya-Watson
Local linear reg.
Pn
Bandwidth choice
Cross validation
• Weighted average of yi (m̂(x) = i=1 wni (x)yi )
Asymptotics
b̂h (x) = (b̂0,h (x), . . . , b̂p,h (x))⊤ = (X ⊤ KX)−1 X ⊤ K(y1 , . . . , yn )⊤

Testing
Examples
Multivariate reg.
for X = (1, xi − x, . . . , (xi − x)p )n
i=1 and
Semiparametrics
Discrete choice
K = diag(K{(xi − x)/h})
(j)
Censored data • Estimates of derivatives m̂h (x) = bj (x) · j!
Final thoughts
Simulated example
Introduction
Nadaraya-Watson and local linear regression
Estimation
(m(x) = x + 5 sin(2x), n = 400, h = 0.8)
Binary choice
Density estimation
Regression est. Local constant (red dashed) and linear (blue solid) regression
Cond. moments
Nonpar. regression
10
Various estimators
Local linear reg.
Example
Assumptions
5
Nadaraya-Watson
Local linear reg.
Y
Bandwidth choice
Cross validation
Asymptotics
0
Testing
Examples
Multivariate reg.
Semiparametrics
-5
Discrete choice
0 1 2 3 4 5
Censored data X
Final thoughts
Example NW – coronary heart disease data
Introduction
probit chd age
Estimation
predict prob, pr
Binary choice
lpoly chd age, addplot((line prob age, sort))
Density estimation
Regression est.

Nonpar. regression
1
Various estimators
Example
.6
Assumptions
Nadaraya-Watson
Local linear reg.
.4
Bandwidth choice
Cross validation
.2
Asymptotics
0
Testing
Examples 0 20 40 60

Discrete choice
Censored data
Final thoughts
Example LLR – coronary heart disease data
Introduction
probit chd age
Estimation
predict prob, pr
Binary choice
lpoly chd age, degree(1) addplot((line prob age, sort))
Density estimation
Regression est.

Nonpar. regression
1
Various estimators
Local polynomial
⊲ Example
.6
Assumptions
Nadaraya-Watson
Local linear reg.
.4
Bandwidth choice
Cross validation
.2
Asymptotics
0
Testing
Examples 0 20 40 60

Discrete choice
Censored data
Final thoughts
Introduction
Food expenditures vs. net income in the U.K., 1973
Estimation
Binary choice
Density estimation Engel Curve (dashed) and its derivative (solid)

Regression est.
1.5
Cond. moments
Nonpar. regression
Various estimators
Local linear reg.
Local polynomial
⊲ Example
1
Assumptions
Food expenditures
Nadaraya-Watson
Local linear reg.
Bandwidth choice
Cross validation
Asymptotics
0.5
Testing
Examples
Multivariate reg.
Semiparametrics
Discrete choice
0
Censored data
0 0.5 1 1.5 2 2.5 3
Final thoughts Net income
Nonparametric regression – assumptions
Introduction
Finite-sample properties
Estimation
Binary choice
• method comparison
Density estimation • bandwidth choice
Regression est.
Cond. moments Assumptions for yi = m(xi ) + εi
Nonpar. regression
Various estimators • (xi , yi ) i.i.d. sample from (x, y) ∼ f
Local linear reg.
Local polynomial • εi i.i.d. with zero mean and independent of xi
Example
⊲ Assumptions • m and f twice continuously differentiable
Nadaraya-Watson
Local linear reg. • kernel K symmetric with
Bandwidth choice
Cross validation ´ ´
Asymptotics ◦ K(x)dx = 1, xK(x)dx = 0
Confidence intervals ´ 2
Testing ◦ x K(x)dx = µ2 (K) < ∞
Examples ´ 2
Multivariate reg. ◦ K (x)dx = kKk2 < ∞
Semiparametrics
Discrete choice • h = hn → 0 and nh → ∞ as n → ∞

′′
Censored data • fx (z) bounded around x in interior of supp(fx )
Final thoughts
Nadaraya-Watson estimator
Introduction
• Bias (fx (x) > 0)
Estimation
( ′
)
h2
Binary choice
fx (x)m′ (x) 1
Density estimation bias[m̂h (x)] = ′′
m (x) + 2 +O( )+o(h2 )
Regression est.
2 fx (x) nh
Cond. moments
Nonpar. regression
Various estimators • Variance (σ 2 (x) = var(εi |xi ))
Local linear reg.
Local polynomial
Example
1 σ 2 (x) 2 1
Assumptions var[m̂h (x)] = kKk + o( )
⊲ Nadaraya-Watson nh fx (x) nh
Local linear reg.
Bandwidth choice
Cross validation • Comparison with density estimators
Asymptotics
Testing ◦ bias proportional to curvature (m′′ (x))
Examples ′
Multivariate reg. ◦ extra bias term (m′ (x)fx (x)/fx (x))
Semiparametrics
Discrete choice
Censored data
Final thoughts
Local linear regression
Introduction
• Bias
Estimation
h2
Binary choice
bias[m̂h (x)] = µ2 (K)m′′ (x) + o(h2 )
• Variance (σ 2 (x) = var(εi |xi ))
Regression est.
Cond. moments
Nonpar. regression
Various estimators 1 σ 2 (x) 2 1
Local linear reg. var[m̂h (x)] = kKk + o( )
Local polynomial nh fx (x) nh
Example
Assumptions
Nadaraya-Watson
• Comparison with Nadaraya-Watson estimator
⊲ Local linear reg.
Bandwidth choice ◦ bias independent of fx (design)
Cross validation
Asymptotics ◦ no bias for linear functions m
Testing ◦ very similar to bias and variance of density estimator
Examples
Multivariate reg.
• General combination with a parametric estimator m(x, β):
Semiparametrics
Pn
Discrete choice i=1 {yi − m(x, β)}2 K{(xi − x)/h}
Censored data
Final thoughts
◦ reduced bias if m(x, β) is close to m(x)
Simulated example
Introduction
Nadaraya-Watson and local linear regression
Estimation
(m(x) = x + 5 sin(2x), n = 1000, h = 0.8)
Binary choice
Density estimation
Regression est. Local constant (red dashed) and linear (blue solid) regression
Cond. moments
Nonpar. regression
10
Various estimators
Local linear reg.
Local polynomial
5
Example
Assumptions
Nadaraya-Watson
⊲ Local linear reg.
0
Y
Bandwidth choice
Cross validation
Asymptotics
-5
Testing
Examples
Multivariate reg.
-10
Semiparametrics
Discrete choice
-5 0 5
Censored data X
Final thoughts
Bandwidth choice
Introduction
Optimal bandwidth selection
Estimation
Binary choice
• minimize mean integrated squared error
Density estimation
ˆ
Regression est. a
Cond. moments M ISE(m̂) = E [m̂(x) − m(x)]2 dx = c1 /(nh) + c2 h4
Nonpar. regression
Various estimators
Local linear reg.
Local polynomial ⇒ hopt = (c1 /4c2 )n−1/5
Example
Assumptions
Nadaraya-Watson
• plug-in estimator – not used, complicated
Local linear reg.
⊲ Bandwidth choice • alternative: minimize mean average squared error
Cross validation " n
#
Asymptotics
1 X
M ASE(m̂) = E {m̂(xi ) − m(xi )}2
Testing n
Examples i=1
Multivariate reg.
◦ advantage: if yi − m(xi ) and m̂(xi ) are uncorrelated,
Semiparametrics Pn Pn
Discrete choice i=1 {yi
− m̂(xi )}2 = i=1 {m(xi ) + εi − m̂(xi )}2 =
Censored data σ 2 + M ASE(m̂)
Final thoughts
Cross validation
Introduction
Mean average squared error
Estimation
Binary choice
• yi − m(xi ) and m̂(xi ) are correlated
Density estimation • solution: omit the ith observation from the sample to estimate
Regression est. m(xi ) by m̂h,−i (xi ), which is uncorrelated with yi − m(xi )
Cond. moments
Nonpar. regression
Various estimators
Local linear reg. Leave-one-out cross validation
Local polynomial
Example n
X
Assumptions
Nadaraya-Watson hCV = arg minh {yi − m̂h,−i (xi )}2
Local linear reg.
Bandwidth choice
i=1
⊲ Cross validation
Asymptotics
Confidence intervals • m̂h,−i (xi ) = leave-one-out estimate
Testing
Examples
based on observations 1, . . . , i − 1, i + 1, . . . , n
Multivariate reg.
• hCV → hopt very slowly (∼ n−1/10 )
Semiparametrics
Discrete choice
• often used
Censored data
Final thoughts
Introduction
Food expenditures vs. net income in the U.K., 1973:
Estimation
bandwidth selection by cross validation
Binary choice
Density estimation Cross Validation Optimal h:

Regression est. 0.00021+criterion*E-6 0.12884
Cond. moments
10
Nonpar. regression
Various estimators
5
Local linear reg. 5 10 15 20 25 30 35

Local polynomial 0.05+h*E-2
Example
Assumptions Engel curve Range of h:
Nadaraya-Watson 0.075
1.5
Local linear reg. 0.4

Bandwidth choice
Points:
⊲ Cross validation
100
Asymptotics
1
Confidence intervals ----------

Testing
Y
Examples Kernel K:
qua
0.5
Multivariate reg.
Semiparametrics Binwidth d:
0.01
Discrete choice
0
Censored data
0.5 1 1.5 2 2.5 3
Final thoughts X
Asymptotics – consistency
Introduction
Provided that kernel K and density f satisfy additionally
Estimation
Binary choice • |x|K(x) → 0 as x → ∞

Density estimation • supx∈R |K(x)| < ∞
K 2+δ (x)dx < ∞
Regression est.
´
Cond. moments
•
Nonpar. regression
Various estimators
Local linear reg. • f (x) > 0
Local polynomial
Example • E(ε2+δ
i )<∞
Assumptions
Nadaraya-Watson
Local linear reg.
The Nadaraya-Watson estimator is
Bandwidth choice
Cross validation • weakly pointwise consistent: limn→∞ m̂(x) = m(x)
⊲ Asymptotics
Confidence intervals • uniformly strongly consistent (under stronger assumptions)
Testing
Examples
Multivariate reg. lim sup |m̂(x) − m(x)| = 0
Semiparametrics x
Discrete choice
(nh2h → 0, uniform continuity of f and m, ...)
Censored data
Final thoughts
Asymptotics – asymptotic distribution
Introduction
The Nadaraya-Watson estimator is (hn → 0 and nhn → ∞)
Estimation
Binary choice • asymptotically normal (pointwise):

Density estimation √ σ 2 (x)
Regression est. nh{m̂h (x) − Em̂(x)} → N 0, kKk2
Cond. moments f (x)
Nonpar. regression √ √
Various estimators
Local linear reg.
(the same applies to nh(m̂h − m) if nhh2 → 0,
Local polynomial which does not hold for hopt , but for undersmoothing)
Example
Assumptions • asymptotics of m̂h − m under h = cn−1/5
Nadaraya-Watson
Local linear reg. √
Bandwidth choice nh{m̂h (x) − m(x)} →
Cross validation
⊲ Asymptotics
( ′
) !
2 m′′ (x) m′ (x)fx (x) σ 2 (x)
kKk2
Testing → N c µ2 (K) + ,
Examples 2 fx (x) cfx (x)
Multivariate reg.
√
Semiparametrics
• optimal rate of convergence: hopt ∼ n−1/5 ⇒ nh ∼ n2/5
Discrete choice
Censored data
• local linear estimator – analogous
Final thoughts
Introduction
Asymptotic normality ⇒ pointwise confidence intervals
Estimation
Binary choice
• asymptotic confidence interval
Density estimation
α
" #−1/2
Φ−1 1− σ̂ 2 (x)
ˆ
Regression est.
Cond. moments m̂(x) ± √ 2
K 2 (x)dx
Nonpar. regression nh fˆ(x)
Various estimators
Local linear reg.
under undersmoothing (h ∼ n−1/5−δ , δ > 0)

Local polynomial
Example
Assumptions
Nadaraya-Watson
• finite samples
Local linear reg.
Bandwidth choice
Cross validation
◦ undersmooth (see above)
Asymptotics
⊲ Confidence intervals
◦ estimate bias (difficult in small samples)
Testing
Examples
Multivariate reg.
• confidence bands – analogous to density estimation
Semiparametrics
Discrete choice
Censored data
Final thoughts
Testing
Introduction
Testing H0 : m = g vs. H1 : m 6= g for a known g(x, θ)
Estimation
Binary choice • pointwise tests: select xt1 , . . . , xtd , compute

Density estimation √
Regression est. Tj = nhvj−1 [m̂(xtj ) − g(xtj , θ̂)]2 ∼ N (0, 1),
Cond. moments
Nonpar. regression Pd
Various estimators and use maxj=1,...,d Tj or average T̄ = i=1 Tj /d
Local linear reg.
Local polynomial (Tj asymptotically independent)
Example
Assumptions • conditional moment tests E[εi |xi ] = 0
Nadaraya-Watson
Local linear reg. (asymptotically normal test statistics)
Bandwidth choice
Cross validation
◦ ⇒ E{h(xi )E[εi |xi ]} = 0:
Asymptotics Pn
Confidence intervals ρ̂ = i=1 {m̂(xi ) − g(xi , θ̂)}{yi − g(xi , θ̂)}/n
⊲ Testing
Examples ◦ ⇒ E{E[εi |xi ]}2 = 0:
Multivariate reg. Pn ˆ \
Semiparametrics ρ̃ = i=1 fx (xi )E(ε̂ i |xi ){yi − g(xi , θ̂)}/n
Discrete choice
Censored data
• many other possibilities
Final thoughts
Example: CPS 1985
Introduction
Income-experience profile (USA, CPS 1985)
Estimation
(education fixed at 12 years)
Binary choice
Density estimation
LS (solid) and NW (dashed) fits at educ = 12
Regression est.
Cond. moments
3
Nonpar. regression
Various estimators
Local linear reg. 2.5
Local polynomial
Example
2
Assumptions
Log Income
Nadaraya-Watson
Local linear reg.
1.5
Bandwidth choice
Cross validation
Asymptotics
1
Testing
0.5
⊲ Examples
Multivariate reg.
Semiparametrics
0
Discrete choice 0 10 20 30 40
Censored data Experience
Final thoughts
Example: coronary heart disease
Introduction
probit chd age
Estimation
predict prob, pr
Binary choice
lpoly chd age, ci degree(1) addplot((line prob age, sort))
Density estimation
Regression est.

1.5
Nonpar. regression
Various estimators
Local linear reg.
Local polynomial
1
Example
Assumptions
.5
Nadaraya-Watson
Local linear reg.
Bandwidth choice
0
Cross validation
Asymptotics
−.5
Testing
⊲ Examples 0 20 40 60
Semiparametrics 95% CI Evidence of coronary heart disease

lpoly smooth Pr(chd)
Discrete choice
kernel = epanechnikov, degree = 1, bandwidth = 4.49, pwidth = 6.73
Censored data
Final thoughts
Multivariate regression
Introduction
Regression function E(y|x) = E(y|x1 , . . . , xd ).
Estimation
Binary choice • Nadaraya-Watson estimator:

Density estimation n
X K{H −1 (xi − x)}yi
Regression est. m̂H (x) = Pn −1 (x − x)}
j=1 K{H
Cond. moments
i=1 j
Nonpar. regression
Various estimators
Local linear reg.
Local polynomial
• Local linear regression: m̂H (x) = βH0 (x)
Example
Assumptions
Nadaraya-Watson
n
X
Local linear reg.
Bandwidth choice
β̂H (x) = arg minβ {yi − β ⊤ (xi − x)}2 K{H −1 (xi − x)}
Cross validation i=1
Asymptotics
Testing
• Asymptotic mean square error for H = diag(h)
Examples
1
⊲ Multivariate reg.
M SE[m̂(x)] = C1 d + C2 h4
Semiparametrics nh
Discrete choice
• Curse of dimensionality: √
Censored data
Final thoughts
hopt ∼ n−1/(4+d) ⇒ AM SE ∼ n−4/(4+d) , nh ∼ n−2/(4+d)
Example: CPS 1985
Introduction
Income as a function of education and experience
Estimation
and Mincer’s equation (USA, CPS 1985)
Binary choice
(education = 2 to 18 years, experience = 0 to 55 years)
Density estimation
Regression est.
Cond. moments
Nonpar. regression Wage = m(Education, Experience) Wage = a + b*Educ + c*Exp + d*Exp^2
Various estimators
Local linear reg.
Local polynomial
Example
Assumptions
Nadaraya-Watson
Local linear reg.
Bandwidth choice
Cross validation
Asymptotics
43.9
Confidence intervals 37.8
31.6
Testing 25.5
43.9 19.4
Examples 37.8 13.2
31.6 7.1
⊲ Multivariate reg. 25.5
19.4 6.8 10.5 14.2
13.2
7.1
Semiparametrics
6.8 10.5
Discrete choice 14.2
Censored data
Final thoughts
Introduction
Estimation
Binary choice
Density estimation
Regression est.
⊲ Semiparametrics
Semiparametric LS
Klein and Spady
Example
Semiparametric regression
Average derivative
PLM estimation
Heteroscedasticity
Application
Discrete choice
Censored data
Final thoughts

Semiparametric LS
Introduction
Semiparametric least squares: Ichimura (1993)
Estimation
Nonlinear least squares for E(Yi |Xi ) = g(Xi⊤ β): g is known
Binary choice
n
X
Density estimation
Regression est.
min {Yi − g(Xi⊤ β)}2
β∈B
Semiparametrics
i=1
⊲ Semiparametric LS
Klein and Spady Semiparametric least squares: g is unknown
Example
Average derivative
PLM
• estimate g(Xi⊤ β) = E(Yi |Xi⊤ β) = E[g(Xi⊤ β)|Xi⊤ β) using a
Heteroscedasticity leave-one-out estimator ĝn (Xi⊤ β) = Ê−i,n (Yi |Xi⊤ β) =
Application
,
Discrete choice
X (Xj − Xi ⊤
) β X (Xj − Xi ⊤
) β
Yi K K
Censored data hn hn
j6=i j6=i
Final thoughts
• maximize or solve FOC to get β̂n

n
X
min {Yi − ĝn (Xi⊤ β)}2
β∈B
i=1

Semiparametric LS
Let Yi = g(Xi⊤ β) + εi , where Xi does not contain intercept, and

Introduction
Estimation
Binary choice • β ∈ B compact and |β1 | = 1

Density estimation
• g and K be three times (Lipschitz) differentiable wrt. z = x⊤ β
Regression est.
• Yi has finite k th moments, k ≥ 3, and cov(Yi |Xi ) is uniformly
Semiparametrics
⊲ Semiparametric LS bounded
Klein and Spady 3+3/(k−1)
Example • ln hn /[nhn ] → 0 and nh8n → 0
Average derivative
PLM Then the semiparametric LS estimator is consistent and
Heteroscedasticity
Application
√ d −1 −1

Discrete choice n(β̂n − β) → N 0, V ΣV
Censored data
Final thoughts
• Σ = E[E(ε2i |Xi ) · {g ′ (Xi⊤ β)}2 ×
{Xi − E(Xi |Xi⊤ β)}{Xi − E(Xi |Xi⊤ β)}⊤ ]
• V = E[{g ′ (Xi⊤ β)}2 {Xi − E(Xi |Xi⊤ β)}{Xi − E(Xi |Xi⊤ β)}⊤ ]
• semiparametrically efficient
• under heteroscedasticity, weighting can be employed
Klein and Spady
Introduction
Klein and Spady (1993): estimate F and maximize likelihood based
Estimation
on the estimated distribution function F̂n
Binary choice
Density estimation • binary response: F (Xi⊤ β) = P (Yi = 1|Xi⊤ ) = E(Yi |Xi⊤ β)

Regression est. and leave-one-out estimator
Semiparametrics
,
Semiparametric LS X (Xi − x)⊤ β X (Xj − ⊤
x) β
⊲ Klein and Spady
F̂−i,n (x) = Yi K K
Example
hn hn
Average derivative yj =1,j6=i j6=i
PLM
Heteroscedasticity
Application
• maximize likelihood (ξi is a trimming indicator)
Discrete choice
Censored data
n
X
Final thoughts ξi [Yi ln F̂−i,n (Xi⊤ β) + (1 − Yi ) ln{1 − F̂−i,n (Xi⊤ β)}]
i=1
or solve the first-order conditions

Klein and Spady
Introduction
Klein and Spady (1993)’s estimator under assumptions
Estimation
Binary choice
• data are iid, β ∈ B compact
Density estimation • P (β) = P (Yi = 1|Xi⊤ β) = P (Yi = 1|Xi ) ∈ (a, b),
Regression est. where 0 < a and b < 1
Semiparametrics
Semiparametric LS
• P (Yi = 1|Xi⊤ β = t) continuously differentiable in t
⊲ Klein and Spady
Example
• n−1/6 < hn < n−1/8 and a higher-order kernel is used
Average derivative
PLM is
Heteroscedasticity
Application
• consistent and asymptotically normal
Discrete choice
−1 !
Censored data
√ d ∂P (β) ∂P (β) 1
Final thoughts n(β̂n −β) → N 0, E
∂β ∂β ⊤ P (β)[1 − P (β)]
• semiparametrically efficient
• (parametrically) efficient if E(Xi |Xi⊤ β) = c0 + c1 (Xi⊤ β)

Example: labor force participation
Introduction
Married women labor force participation (Mroz, 1987):
Estimation
Binary choice
• binary decision = choice to work (1) or to stay at home (0)
Density estimation • explanatory variables
Regression est.
Semiparametrics
◦ non-wife household income
Semiparametric LS
Klein and Spady
◦ age
⊲ Example
Average derivative
◦ education
PLM
Heteroscedasticity
◦ labor market experience and its square
Application
◦ number of children (below and above 6)
Discrete choice
Censored data
Final thoughts

Introduction
Married women labor force participation (Mroz, 1987) – estimates:
Estimation
Binary choice
Density estimation
Stata: regress probit sml
Regression est.
Linear Probit KS
Semiparametrics
Semiparametric LS --------------------------------------------------
Klein and Spady
⊲ Example inlf | Coef. SE | Coef. SE | Coef. SE
Average derivative
PLM
---------+-------------+-------------+------------
Heteroscedasticity nwifeinc | -.011 .004 | -.014 .006 | -.015 .001
Application
Discrete choice
educ | .141 .027 | .151 .029 | .129 .011
Censored data
exper | .382 .019 | .142 .022 | .243 ---
Final thoughts
expersq | -.008 .000 | -.002 .001 | -.005 .000
age | -.061 .008 | -.060 .010 | -.058 .002
kidslt6 | -1 .126 | -1 .137 | -1 .025
kidsge6 | .050 .049 | .041 .050 | .052 .009
_cons | 2.237 .588 | .311 .585 | --- ---
---------+-------------+-------------+------------
Introduction
probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6
Estimation
predict pindex, xb
Binary choice
predict prob, p
Density estimation
lpoly inlf pindex, ci addplot((line prob pindex, sort))
Regression est.
Semiparametrics
Semiparametric LS Local polynomial smooth
Klein and Spady
⊲ Example 1
Y=1 if in labor force, 1975
Average derivative
PLM
.5
Heteroscedasticity
Application
Discrete choice
0
Censored data
Final thoughts
−.5
−3 −2 −1 0 1 2
Linear prediction
95% CI Y=1 if in labor force, 1975

lpoly smooth Pr(inlf)

Introduction
probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6
Estimation
predict pindex, xb
Binary choice
predict prob, p
Density estimation
lpoly inlf pindex, ci degree(1) addplot((line prob pindex, sort))
Regression est.
Semiparametrics
Semiparametric LS Local polynomial smooth
1.5
Klein and Spady
⊲ Example
Y=1 if in labor force, 1975
Average derivative
1
PLM
Heteroscedasticity
Application
.5
Discrete choice
Censored data
0
Final thoughts
−.5
−3 −2 −1 0 1 2
Linear prediction
95% CI Y=1 if in labor force, 1975

lpoly smooth Pr(inlf)

Average derivative
Direct estimation of single index model E(Yi |Xi ) = g(Xi⊤ β)

Introduction
Estimation
(Powell, Stock, and Stoker, 1989; Härdle and Stoker, 1989)
Binary choice
Density estimation • moment condition for m(x) = E(Yi |Xi = x)

Regression est.
Semiparametrics
′ ∂g(x ⊤ β)
′ ⊤
′
Semiparametric LS m (x) = = g (x β)β ⇒ E m (Xi ) = γβ
Klein and Spady ∂x
Example
⊲ Average derivative
PLM • integration by parts (f denotes density of Xi )
Heteroscedasticity
Application
′

f (Xi )
ˆ ˆ
Discrete choice Em′ (Xi ) = m′ (x)f (x)dx = − m(x)f ′ (x)dx = −E Yi
Censored data
f (Xi )
Final thoughts
• estimate f and f ′ by kernel density estimator and set (an ↓ 0)
Yi fˆn (Xi )
n
X ′
c =
γβ − · I[fˆn (Xi ) > an ]
n
i=1 nfˆn (Xi )

Average derivative
Average derivative estimation of E(Yi |Xi ) = g(Xi⊤ β) under

Introduction
Estimation
Binary choice • Xi ∈ Rp has a convex support with ⌈p/2 + 2⌉ times

Density estimation differentiable density f
Regression est.
• m′ (Xi ) and f ′ (Xi ) are Lipschitz and have finite second
Semiparametrics
Semiparametric LS moments
Klein and Spady
Example • other regularity assumptions (on kernel K etc.)
⊲ Average derivative
PLM is consistent and asymptotically normal
Heteroscedasticity
Application
√ d
Discrete choice n(β̂n − β) → N (0, Ω)
Censored data
Final thoughts • Ω = 4E[R(Yi , Xi )R(Yi , Xi )⊤ ] − 16E[Yi f ′ (Xi )]

• R(Yi , Xi ) = f (Xi )m′ (Xi ) − {Yi − m(Xi )}f ′ (Xi )

Other applications – partially linear models
Introduction
Partially linear models (Robinson, 1988, ECO 56, 931–954)
Estimation
Binary choice
• linear regression with one variable Zi entering nonparametrically
Density estimation
Regression est.
Yi = Xi⊤ β +g(Zi )+εi , E(εi |Xi , Zi ) = 0, g:R→R
Semiparametrics
Semiparametric LS Extensions (cf. Tobit and sample selection models)
Klein and Spady
Example
Average derivative
• partially linear single-index model (Xia, Tong, and Li, 1999)
⊲ PLM
Heteroscedasticity
Application
Yi = Xi⊤ β + g(Zi⊤ γ) + εi
Discrete choice
• generalized partially linear models (Carroll et al., 1997)
Censored data
Final thoughts
E(Yi |Xi ) = F (Xi⊤ β + g(Zi ))
e.g., for logit with F ≡ Λ, E(Yi |Xi ) = P (Yi = 1|Xi ) and
P (Yi = 1|Xi )
ln = Xi⊤ β + g(Zi )
P (Yi = 0|Xi )
Other applications – partially linear models
Model Yi = Xi⊤ β + g(Zi ) + εi (no constant in Xi ) implies

Introduction
Estimation
Binary choice • E(Yi |Zi ) = E(Xi |Zi )⊤ β + g(Zi ) + E(εi |Zi ) and
Density estimation
Regression est. Yi − E(Yi |Zi ) = {Xi − E(Xi |Zi )}⊤ β + {εi − E(εi |Zi )}
Semiparametrics
Semiparametric LS
Klein and Spady
Estimation (involves p + 1 nonparametric regressions)
Example
Average derivative • estimate µyi = µy (Zi ) = E(Yi |Zi ), µxi = µx (Zi ) = E(Xi |Zi )
⊲ PLM
Heteroscedasticity • estimate β by the least squares estimator β̂n
Application
Discrete choice
" n
#−1 " n #
X X
Censored data
(Xi − µ̂xi )(Xi − µ̂xi )⊤ (Xi − µ̂xi )(Yi − µ̂yi )⊤
Final thoughts
i=1 i=1
• estimate function g nonparametrically:
◦ set ĝn (Zi ) = µ̂yi − µ̂⊤

xi β̂n , or
◦ nonparametrically regress Yi − Xi⊤ β̂n on Zi
Other applicartions – heteroscedastic linear regression
Linear regression Yi = Xi⊤ β + εi with var(εi |Xi ) = σ 2 (Xi ).

Introduction
Estimation Pn −2 ⊤
Binary choice
Generalized LS (GLS) solves i=1 σ (Xi )Xi (Yi − Xi β) = 0
Density estimation
n
!−1 n
!
X Xi Xi⊤ X Xi Yi
β̂nGLS
Regression est.
=
Semiparametrics
i=1
σ 2 (Xi )
i=1
σi2 (Xi )
Semiparametric LS
Klein and Spady
Example
Average derivative
Heteroscedasticity of
PLM
⊲ Heteroscedasticity • known form: σ 2 (Xi ) = exp(Xi⊤ γ) and estimate γ̂n
Application
• unknown form: nonparametric estimate σ̂n2 (Xi ) of σ 2 (Xi )
Discrete choice
Censored data
◦ Robinson (1987) ECO 55(4), 875–891: compute
Final thoughts
ei = yi − x⊤ OLS and nonparametrically estimate
i β̂
σ 2 (x) = E(ε2i |Xi ) using residuals e2i
◦ alternative – fully nonparametric estimation: compute
\
ẽi = Yi − E(Y i |Xi ) and nonparametrically estimate
σ 2 (Xi ) = E(ε2i |Xi ) using residuals ẽ2i
Application
Introduction
Lehrer and Kordas (2013) Matching using semiparametric propensity
Estimation
scores. Empirical Economics 44, 13–45.
Binary choice
Density estimation • look at impact of job training programs

Regression est. • study (simulations) and analyze (application) propensity score
Semiparametrics
matching
Semiparametric LS
Klein and Spady
• matching by probit, Klein-Spady, and maximum score estimators
Example
Average derivative
PLM
Heteroscedasticity
⊲ Application
Discrete choice
Censored data
Final thoughts

Introduction
Estimation
Binary choice
Density estimation
Regression est.
Semiparametrics
⊲ Discrete choice
Introduction
Ordered models Discrete choice models
Example
Specification tests
Semiparametrics
Application
Multinomial models
Latent model
Multinomial logit
Latent model
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics
Censored data
Final thoughts

Multinomial and ordered response models
Introduction
Data with a discrete response with more than two values
Estimation
Binary choice
• multiple discrete responses yi = 0, 1, . . . , J
Density estimation
◦ ordered response (values not completely arbitrary)
Regression est.
(e.g., credit rating, preference, health plan choice, ...)
Semiparametrics
Discrete choice ◦ unordered (nominal) response

⊲ Introduction
Ordered models
(e.g., mode of transportation, choice of industry for
Example investment, brand choice, ...)
Specification tests
Semiparametrics
Application • explanatory variables xi
Multinomial models
Latent model • motivated by latent (utility-maximization) models
Multinomial logit
Latent model
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics
Censored data
Final thoughts

Ordered response models
Introduction
Discrete response yi = 0, . . . , J , where responses are ordered
Estimation
(ratings, preferences for food, no/part-time/full-time job)
Binary choice
Density estimation Ordered probit and logit models:

Regression est.
• latent (utility) model yi∗ = x⊤
i β + εi , where εi ∼ F
Semiparametrics
(e.g., F = N (0, 1) or F = Λ(0, 1))
Discrete choice
Introduction
• cut-off points α1 < α2 < . . . < αJ
⊲ Ordered models
Example
• response 
Specification tests
Semiparametrics 
 0 if yi∗ ≤ α1
Application 

Multinomial models 1 if α1 < yi∗ ≤ α2
Latent model
yi = .. ..
Multinomial logit

 . .
Latent model 

Conditional logit 
Multinomial probit J if αJ < yi∗
Hierarchy
Semiparametrics
• identification: no intercept in xi or α1 = 0 as
Censored data
α1 < x⊤ ⊤
i β ≤ α2 ⇔ α1 + γ < xi β + γ ≤ α2 + γ
Final thoughts
• (yi , xi )n
i=1 random sample, full-rank assumption satisfied
Introduction
Ordered response models:
Estimation
Binary choice
• probabilities (P (yi = j|xi ) = P (αj−1 < x⊤
i β + εi ≤ αj |xi ))
Density estimation
Regression est.
P (yi = 0|xi ) = F (α1 − x⊤
i β) − 0
Semiparametrics P (yi = 1|xi ) = F (α2 − x⊤
i β) − F (α1 − x ⊤
i β)
Discrete choice .. .. ..
Introduction
. . .
⊲ Ordered models
Example
Specification tests
P (yi = J|xi ) = 1 − F (αJ − x⊤
i β)
Semiparametrics
Application
Multinomial models • log-likelihood contributions (for MLE, FOC, variance, ...)
Latent model
Multinomial logit
Latent model l(wi , β) = I(yi = 0) · ln F (α1 − x⊤
i β)
Conditional logit
Multinomial probit + I(yi = 1) · ln[F (α2 − x⊤
i β) − F (α1 − x ⊤
i β)]
Hierarchy
Semiparametrics + ...
+ I(yi = J) · ln[1 − F (αJ − x⊤
Censored data
Final thoughts
i β)]

Introduction
Probit fit of yi = I(0.5 + xi + εi > −0.5) + I(0.5 + xi + εi > 1)
Estimation
for εi ∼ N (0, 1) and n = 1000
Binary choice
Density estimation
3
Regression est.
Semiparametrics
Discrete choice
Introduction
2
⊲ Ordered models
Example
Specification tests
Semiparametrics
1
Application
Multinomial models
Latent model
Multinomial logit
Latent model
0
Conditional logit
Multinomial probit −6 −4 −2 0 2 4
Hierarchy Linear prediction (cutpoints excluded)
Semiparametrics
y Pr(y==0)
Censored data Pr(y==1) Pr(y==2)
Final thoughts

Introduction
Ordered response models:
Estimation
Binary choice
• probabilities (P (yi = j|xi ) = P (αj−1 < x⊤
i β + εi ≤ αj |xi ))
Density estimation
Regression est.
P (yi = 0|xi ) = F (α1 − x⊤
i β)
Semiparametrics P (yi = 1|xi ) = F (α2 − x⊤
i β) − F (α1 − x ⊤
i β)
Discrete choice .. .. ..
Introduction
. . .
⊲ Ordered models
Example
Specification tests
P (yi = J|xi ) = 1 − F (αJ − x⊤
i β)
Semiparametrics
Application
Multinomial models • marginal effects (see signs of the effect of the middle terms!)
Latent model
Multinomial logit
Latent model ∂P (yi = 0|xi )/∂xik = −βk f (α1 − x⊤
i β)
Conditional logit
Multinomial probit ∂P (yi = 1|xi )/∂xik = βk [f (α1 − x⊤
i β) − f (α2 − x ⊤
i β)]
Hierarchy
Semiparametrics .. .. ..
. . .
Censored data
Final thoughts ∂P (yi = J|xi )/∂xik = βk f (αJ − x⊤

i β)

Example: asset allocation
Introduction
Pension-plan decision of adults
Estimation
(mostly bonds = 0, mixed = 1, mostly stocks = 2)
Binary choice
Density estimation Explanatory variables

Regression est.
• ability to choose pension investment scheme
Semiparametrics
Discrete choice
• profit-sharing plan
Introduction
Ordered models
• age
⊲ Example
Specification tests
• education
Semiparametrics
Application
• gender
Multinomial models
Latent model
• race
Multinomial logit
Latent model
• marital status
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics
Censored data
Final thoughts

Introduction
Estimation oprobit pctstck choice prftshr female married

Binary choice age educ black
Density estimation
Regression est. ---------------------------------------------------

Semiparametrics res | Coef. Std. Err. z P>|z|
Discrete choice
Introduction
----------+----------------------------------------
Ordered models choice | .3064487 .1706318 1.80 0.073
⊲ Example
Specification tests prftshr | .5070566 .2066888 2.45 0.014
Semiparametrics
Application female | .0332831 .1951634 0.17 0.865
Multinomial models
Latent model
married | .0133023 .2116247 0.06 0.950
Multinomial logit age | -.044364 .0206048 -2.15 0.031
Latent model
Conditional logit educ | .0257263 .0314588 0.82 0.413
Multinomial probit
Hierarchy
black | .1386562 .2634013 0.53 0.599
Semiparametrics
----------+----------------------------------------
Censored data
/cut1 | -2.444581 1.430851
Final thoughts
/cut2 | -1.428021 1.426402
Econometrics
---------------------------------------------------
Slide 150
Introduction
Estimation margins, predict(outcome(#1)) dydx(age prftshr)

Binary choice margins, predict(outcome(#2)) dydx(age prftshr)
Density estimation margins, predict(outcome(#3)) dydx(age prftshr)
Regression est.
Semiparametrics ---------------------------------------------------
Discrete choice
Introduction
| Delta-method
Ordered models | dy/dx Std. Err. z P>|z|
⊲ Example
Specification tests ----------+----------------------------------------
Semiparametrics
Application 0 prftshr | -.1766166 .070157 -2.52 0.012
Multinomial models
Latent model
age | .0154528 .0070159 2.20 0.028
Multinomial logit ---------------------------------------------------
Latent model
Conditional logit 1 prftshr | .0097728 .0133563 0.73 0.464
Multinomial probit
Hierarchy
age | -.0008551 .0011694 -0.73 0.465
Semiparametrics
---------------------------------------------------
Censored data
2 prftshr | .1668438 .0662334 2.52 0.012
Final thoughts
age | -.0145977 .0066674 -2.19 0.029
Econometrics
---------------------------------------------------
Slide 151
Introduction
oprobit pctstck choice prftshr female married age educ black
Estimation
predict ind, xb; predict p1, outcome(#1)
Binary choice
gen y1 = (y == 0)
Density estimation
lpoly y1 ind, ci degree(1) addplot((line p1 ind, sort))
Regression est.
Semiparametrics
Discrete choice
Introduction
1
Ordered models
⊲ Example
Specification tests
I(y = 0)
Semiparametrics
.5
Application
Multinomial models
Latent model
Multinomial logit
0
Latent model
Conditional logit
−3 −2.5 −2 −1.5 −1
Multinomial probit Linear prediction (cutpoints excluded)
Hierarchy
95% CI I(y=0)
Semiparametrics
lpoly smooth Pr(y=0)
Censored data kernel = epanechnikov, degree = 1, bandwidth = .37, pwidth = .56
Final thoughts

Introduction
Estimation
Binary choice
gen y2 = (y == 1)
Density estimation
Regression est.
Semiparametrics
Discrete choice
1.5
Introduction
Ordered models
⊲ Example
1
Specification tests
I(y = 1)
Semiparametrics
.5
Application
Multinomial models
Latent model
0
Multinomial logit
Latent model
−.5
Conditional logit
−3 −2.5 −2 −1.5 −1
Hierarchy
95% CI I(y=1)
Semiparametrics
Final thoughts

Introduction
Estimation
Binary choice
gen y3 = (y == 2)
Density estimation
Regression est.
Semiparametrics
Discrete choice
1
Introduction
Ordered models
⊲ Example
.5
Specification tests
I(y = 2)
Semiparametrics
0
Application
Multinomial models
−.5
Latent model
Multinomial logit
Latent model
−1
Conditional logit
−3 −2.5 −2 −1.5 −1
Hierarchy
95% CI I(y=2)
Semiparametrics
Final thoughts

Introduction
Data plot with cut-off points
Estimation
Binary choice
2
Density estimation
Regression est.
1.5
Semiparametrics
Discrete choice
Introduction
Ordered models
⊲ Example
1
y
Specification tests
Semiparametrics
Application
Multinomial models
.5
Latent model
Multinomial logit
Latent model
Conditional logit
0
Multinomial probit
Hierarchy −3 −2.5 −2 −1.5 −1
Semiparametrics Linear prediction (cutpoints excluded)
Censored data
Final thoughts

Specification tests
Introduction
Possible problems with the model specification
Estimation
Binary choice
• parallel regression assumption
Density estimation
◦ ordered choice model with constant slopes (probit)
Regression est.
P (yi ≤ j|xi ) = F (αj − x⊤

Semiparametrics
Discrete choice
i β)
Introduction
Ordered models
Example
◦ ordered choice model with varying slopes (probit)
⊲ Specification tests
P (yi ≤ j|xi ) = F (αj − x⊤

Semiparametrics
Application i βj )
Multinomial models
Latent model
Multinomial logit might indicate sequential probit (e.g.,
Latent model
Conditional logit
P (Yi = 2|Xi ) = P (Yi = 2|Yi = 1, Xi ) · P (Yi = 1|Xi ))
Multinomial probit
Hierarchy
Semiparametrics
• heteroscedasticity (similar to probit/logit)
Censored data • distributional assumptions (similar to probit/logit)
Final thoughts

Single index model E(yi |xi ) = g(x⊤

Introduction
Estimation
i β) applies,
J
X
jP (yi = j|x⊤ ⊤
Binary choice
E(yi |xi ) = i β) = g(x i β),
Density estimation
j=0
Regression est.
Semiparametrics but threshold in general not identified unless intercept is:

Discrete choice
Introduction
Ordered models
I(yi ≥ k) = I(yi∗ > αk−1 ) = I(x⊤
i β + εi − αk−1 > 0)
Example
Specification tests
⊲ Semiparametrics • maximum score estimator identifies intercept
Application n J
Multinomial models 1X X
Latent model β̂nM SE = arg minβ |yi − I(x⊤
i β > αj−1 )|
Multinomial logit n
Latent model
i=1 j=0
Conditional logit
Multinomial probit • Klein & Spady under symmetry of F (Chen, 1999)
Hierarchy
P (yi = 1|x⊤
Semiparametrics
Censored data
i β = t) = F (t) = 1 − F (−t)
Final thoughts = 1 − P (yi = 1|x⊤

i β = −t)

Application: market entry
Introduction
Breshanan and Reis (1991) Entry and competition in concentrated
Estimation
markets. The Journal of Political Economy 99, 977–1009.
Binary choice
Density estimation • study the number of firms in a market given its size and
Regression est. competition
Semiparametrics
• analyze 202 geographically isolated markets (dentists, plumbers,
Discrete choice
Introduction
electricians etc. in county seat cities)
Ordered models
Example
Specification tests
Semiparametrics
⊲ Application
Multinomial models
Latent model
Multinomial logit
Latent model
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics
Censored data
Final thoughts

Multinomial and ordered response models
Introduction
Data with a discrete response with more than two values
Estimation
Binary choice
• multiple discrete responses yi = 0, 1, . . . , J
Density estimation
◦ unordered (nominal) response
Regression est.
(e.g., mode of transportation, choice of industry for
Semiparametrics
Discrete choice
investment, brand choice, ...)
Introduction
Ordered models
◦ ordered response (values not completely arbitrary)
Example (e.g., credit rating, preference, health plan choice, ...)
Specification tests
Semiparametrics
Application • explanatory variables xi
⊲ Multinomial models
Latent model • motivated by latent (utility-maximization) models
Multinomial logit
Latent model
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics
Censored data
Final thoughts

Latent model
Introduction
Deriving the multinomial logit model from a latent utility maximization
Estimation
(McFadden, 1974), where each choice j ≥ 1 has own coefficient βj
Binary choice
Density estimation ∗
yij = x⊤
ij βj + εij j = 0, . . . , J
Regression est.
Semiparametrics
∗ implies
• utility maximization yi = arg maxj=0,...,J yij
Discrete choice
Introduction
Ordered models
Example P (yi = j|xi0 , . . . xiJ ) = P (yij > yik for all k 6= j|xi0 , . . . xiJ )
Specification tests
Semiparametrics
Application • assuming type I extreme value (Gumbel) distribution
Multinomial models
⊲ Latent model εij ∼ F (t) = exp(− exp(t)), it follows
Multinomial logit
Latent model
Conditional logit exp(x⊤
ij βj )
Multinomial probit P (yi = j|xi0 , . . . xiJ ) = PJ
⊤
l=0 exp(xil βl )
Hierarchy
Semiparametrics
Censored data
• coefficients βj or values xij have to vary across choices
Final thoughts P P
(exp(. . .+x⊤
i β)/ exp(. . .+x⊤
i β) = exp(. . .)/ exp(. . .))
Multinomial logit model
Introduction
Multinomial choice model
Estimation
exp(x⊤
ij βj )
Binary choice
P (yi = j|xi0 , . . . xiJ ) = PJ
Density estimation exp(x ⊤β )
l=0 il l
Regression est.
Semiparametrics • coefficients βj or values xij have to vary across choices

Discrete choice
Introduction Consider a multinomial response model using characteristics xi of
Ordered models
Example decision makers (independent of choice j ). Multinomial logit model:
Specification tests
Semiparametrics exp(x⊤i βj )
Application P (yi = j|xi ) = PJ
Multinomial models 1 + l=1 exp(x⊤ i βl )
Latent model
⊲ Multinomial logit 1
Latent model P (yi = 0|xi ) = PJ
Conditional logit 1 + l=1 exp(x⊤ i βl )
Multinomial probit
Hierarchy
Semiparametrics • normalization β0 = (0, . . . , 0)⊤ as (β0 , . . . βj ) + (γ, . . . , γ)
Censored data generates equal probabilities irrespective of the value of γ
⊤
P ⊤
P
Final thoughts
(exp(. . .+xi γ)/ exp(. . .+xi γ) = exp(. . .)/ exp(. . .))
Multinomial logit model
Introduction
Multinomial logit model:
Estimation
exp(x⊤ i βj )
Binary choice
P (yi = j|xi ) = PJ
Density estimation 1 + l=1 exp(x⊤ i βl )
Regression est.
1
Semiparametrics P (yi = 0|xi ) = PJ
Discrete choice
1 + l=1 exp(x⊤ i βl )
Introduction Pn
Ordered models • contributions to log-likelihood function ln Ln (β) = i=1 li (β):
Example
Specification tests
J
X
Semiparametrics
Application li (β) = I(yi = j) log P (yi = j|xi )
Multinomial models
Latent model j=0
⊲ Multinomial logit
Latent model • note: reduction to the binary logit (P (j|j, k) = P (j)/P (j, k))
Conditional logit
Multinomial probit
exp(x⊤ i βj )
Hierarchy
P (yi = j|yi ∈ {j, k}, xi ) = ⊤ ⊤
=
Semiparametrics
exp(xi βj ) + exp(xi βk )
Censored data {1 + exp[−x⊤
i (βj − βk )]}
−1
= Λ{x⊤
i (βj − βk )}
Final thoughts
(independence from irrelevant alternatives)

Multinomial logit model – interpretation
Introduction
Multinomial logit (here j ∈ {1, . . . , J}):
Estimation
Binary choice • probabilities

Density estimation
Regression est.
exp(x⊤i βj )
pj (xi ) = P (yi = j|xi ) = PJ
Semiparametrics 1 + l=1 exp(x⊤ i βl )
Discrete choice
Introduction
Ordered models • partial effects
Example
Specification tests
( PJ )
⊤
Semiparametrics ∂P (yi = j|xi ) l=1 βlk exp(xi βl )
Application = P (yi = j|xi ) βjk − PJ
Multinomial models ∂xik 1 + l=1 exp(x⊤ i βl )
Latent model
Latent model
Conditional logit
• (simpler) interpretation of partial effects via
Multinomial probit
Hierarchy P (yi = j|xi ) ∂ pj (xi )
Semiparametrics = exp(x⊤
i β j ) ⇒ = exp(x ⊤
i βj )βjk
Censored data
P (yi = 0|xi ) ∂xik p0 (xi )
Final thoughts

Example: school and employment decision
Introduction
Employment and schooling decisions of young men
Estimation
(school = 1, home = 2, work = 3)
Binary choice
Density estimation Explanatory variables: characterize individuals

Regression est.
• education
Semiparametrics
Discrete choice
• work experience
Introduction
Ordered models
• race
Example
Specification tests
Semiparametrics
Application
Multinomial models
Latent model
Latent model
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics
Censored data
Final thoughts

Introduction
Estimation
Multinomial logistic regression
Binary choice
------------------------------------------------------
Density estimation
status | Coef. Std. Err. z P>|z|
-------------+----------------------------------------
Regression est.
1 | (base outcome)
Semiparametrics
-------------+----------------------------------------
Discrete choice
Introduction 2 educ | -.6736313 .0698999 -9.64 0.000
Ordered models exper | -.1062149 .173282 -0.61 0.540
Example
Specification tests expersq | -.0125152 .0252291 -0.50 0.620
Semiparametrics
Application
black | .8130166 .3027231 2.69 0.007
Multinomial models _cons | 10.27787 1.133336 9.07 0.000
Latent model
-------------+----------------------------------------
Latent model 3 educ | -.3146573 .0651096 -4.83 0.000
Conditional logit
Multinomial probit
exper | .8487367 .1569856 5.41 0.000
Hierarchy expersq | -.0773003 .0229217 -3.37 0.001
Semiparametrics
black | .3113612 .2815339 1.11 0.269
Censored data
_cons | 5.543798 1.086409 5.10 0.000
Final thoughts
------------------------------------------------------
Introduction
Estimation Multinomial logistic regression

Binary choice Average marginal effects
Density estimation
Regression est. ---------------------------------------------------

Semiparametrics | Delta-method
Discrete choice
Introduction
| dy/dx Std. Err. z P>|z|
Ordered models ----------+----------------------------------------
Example
Specification tests 1 educ | .0173786 .0029057 5.98 0.000
Semiparametrics
Application exper | -.0313191 .0067664 -4.63 0.000
Multinomial models
Latent model
---------------------------------------------------
⊲ Multinomial logit 2 educ | -.0429471 .0030007 -14.31 0.000
Latent model
Conditional logit exper | -.1006127 .0096477 -10.43 0.000
Multinomial probit
Hierarchy
---------------------------------------------------
Semiparametrics
3 educ | .0255685 .0040739 6.28 0.000
Censored data
exper | .1319318 .0107027 12.33 0.000
Final thoughts
---------------------------------------------------
Latent model – back to square one
Introduction
Deriving the multinomial logit model from a latent utility maximization
Estimation
(McFadden, 1974) , where each choice j ≥ 1 has own coefficient βj
Binary choice
Density estimation ∗
yij = x⊤
ij βj + εij j = 0, . . . , J
Regression est.
Semiparametrics
∗ implies
• utility maximization yi = arg maxj=0,...,J yij
Discrete choice
Introduction
Ordered models
Example P (yi = j|xi0 , . . . xiJ ) = P (yij > yik for all k 6= j|xi0 , . . . xiJ )
Specification tests
Semiparametrics
Application • assuming type I extreme value (Gumbel) distribution
Multinomial models
Latent model εij ∼ F (t) = exp(− exp(t)), it follows
Multinomial logit
⊲ Latent model
Conditional logit exp(x⊤
ij βj )
Multinomial probit P (yi = j|xi0 , . . . xiJ ) = PJ
⊤
l=0 exp(xil βl )
Hierarchy
Semiparametrics
Censored data
• coefficients βj or values xij have to vary across choices
Final thoughts P P
(exp(. . .+x⊤
i β)/ exp(. . .+x⊤
i β) = exp(. . .)/ exp(. . .))
Conditional logit model
Introduction
Multinomial choice model
Estimation
exp(x⊤
ij βj )
Binary choice
P (yi = j|xi0 , . . . xiJ ) = PJ
Density estimation exp(x ⊤β )
l=0 il l
Regression est.
Semiparametrics • coefficients βj or values xij have to vary across choices

Discrete choice
Introduction Consider a multinomial response model using characteristics xi of
Ordered models
Example individual choices (varying with choices j ) and one common β .
Specification tests
Semiparametrics
Conditional logit model (xi = (xi0 , . . . , xiJ )):
Application
Multinomial models exp(x⊤
ij β)
Latent model pj (xi ) = P (yi = j|xi0 , . . . , xiJ ) = PJ
Multinomial logit exp(x ⊤ β)
Latent model
l=0 il
⊲ Conditional logit Pn
Multinomial probit • contributions to log-likelihood function ln Ln (β) = i=1 li (β):
Hierarchy
Semiparametrics J
X
Censored data
li (β) = I(yi = j) log P (yi = j|xi )
Final thoughts
j=0

Conditional logit model – interpretation
Introduction
Consider a multinomial response model using characteristics xi of
Estimation
individual choices (varying with choices j ) and one common β .
Binary choice
Density estimation • the probability of the choice j

Regression est. exp(x⊤
ij β)
pj (xi ) = P (yi = j|xi0 , . . . , xiJ ) = PJ
Semiparametrics ⊤
Discrete choice
l=0 exp(xil β)
Introduction
Ordered models • partial effects (xijk = the k th component of xij )
Example
Specification tests
Semiparametrics ∂P (yi = j|xi0 , . . . , xiJ )
Application = pj (xi )[1 − pj (xi )]βk
Multinomial models ∂xijk
Latent model
Multinomial logit ∂P (yi = j|xi0 , . . . , xiJ )
Latent model = −pj (xi )pl (xi )βk
⊲ Conditional logit ∂xilk
Multinomial probit
Hierarchy • odds ratio (independence from irrelevant alternatives)
Semiparametrics
Censored data P (yi = j|xi )

= exp[(xij − xik )⊤ β]
Final thoughts
P (yi = k|xi )
Comparison of multinomial logits
Introduction
Multinomial logit model (e.g., choice of occupation):
Estimation
Binary choice
• individual characteristics used
Density estimation • characteristics of alternative choice unimportant and omitted
Regression est.
Conditional logit (e.g., choice of transport):
Semiparametrics
Discrete choice • characteristics of alternative choices are important and used

Introduction
Ordered models • multinomial logit is a special case of conditional logit
for xij = {xi I(j = l)}Jl=1 and parameter vector (β1 , . . . , βJ )
Example
Specification tests
Semiparametrics
Application Mixed logit:
Multinomial models
Latent model
Multinomial logit
• contains both individual- and choice-specific characteristics
Latent model
⊲ Conditional logit
Multinomial probit Independence from irrelevant alternatives (IIA) assumptions
Hierarchy
Semiparametrics
required!
Censored data
Final thoughts
pj (xi )/pl (xi ) = exp(x⊤
ij β)/ exp(x ⊤
il β) = exp[(x ij − x ⊤
il β]
)

Multinomial probit model
Introduction
Multiple-choice problem: J choices with utilities
Estimation
∗
Binary choice
yij = x⊤
ij β + εij j = 0, . . . , J
Density estimation
Regression est.
Semiparametrics • observable yi = arg maxj=0,...,J yj∗

Discrete choice
Introduction
Ordered models Estimation by multinomial probit:
Example
Specification tests • E = (εi1 , . . . , εiJ )⊤ ∼ N (0, Ω), xi = (xi1 , . . . , xiJ )
Semiparametrics Pn PJ
Application • likelihood log Ln (b) = i=1 j=1 yji log P (yi = j|xi ),
Multinomial models
Latent model where probability P (yi = j|xi ) equals
Multinomial logit
Latent model
∗ ∗
Conditional logit P (yik < yij , ∀k 6= j|xi ) = P {x⊤i b + E i < x ⊤
ij b + εij |xi }
⊲ Multinomial probit ˆ ˆ
Hierarchy
Semiparametrics = ··· I(x⊤ ⊤
ik b + εik < xij b + εij , ∀k 6= j)dΦ(E|Ω)
Censored data ε1 εJ
Final thoughts
• estimation and evaluation for a larger J difficult, by simulation
Introduction
Employment and schooling decisions of young men
Estimation
(school = 1, home = 2, work = 3)
Binary choice
Density estimation Explanatory variables: characterize individuals

Regression est.
• education
Semiparametrics
Discrete choice
• work experience
Introduction
Ordered models
• race
Example
Specification tests
Semiparametrics
Application
Multinomial models
Latent model
Multinomial logit
Latent model
Conditional logit
⊲ Multinomial probit
Hierarchy
Semiparametrics
Censored data
Final thoughts

Introduction
Estimation
Multinomial probit regression
Binary choice
------------------------------------------------------
Density estimation
status | Coef. Std. Err. z P>|z|
-------------+----------------------------------------
Regression est.
1 | (base outcome)
Semiparametrics
-------------+----------------------------------------
Discrete choice
Introduction 2 educ | -.4410793 .0413589 -10.66 0.000
Ordered models exper | -.1137917 .1114018 -1.02 0.307
Example
Specification tests expersq | -.0043746 .0155293 -0.28 0.778
Semiparametrics
Application
black | .6047029 .1908654 3.17 0.002
Multinomial models _cons | 6.706148 .6554217 10.23 0.000
Latent model
Multinomial logit
-------------+----------------------------------------
Latent model 3 educ | -.162221 .0385226 -4.21 0.000
Conditional logit
exper | .6721982 .1027514 6.54 0.000
Hierarchy expersq | -.0592846 .0142553 -4.16 0.000
Semiparametrics
black | .2244359 .1795638 1.25 0.211
Censored data
_cons | 2.985019 .6285942 4.75 0.000
Final thoughts
------------------------------------------------------
Introduction
Estimation Multinomial probit regression

Binary choice Average marginal effects
Density estimation
Regression est. ---------------------------------------------------

Semiparametrics | Delta-method
Discrete choice
Introduction
| dy/dx Std. Err. z Logit
Ordered models ----------+----------------------------------------
Example
Specification tests 1 educ | .0157377 .0029057 5.98 0.017
Semiparametrics
Application exper | -.0334263 .0067664 -4.63 -0.031
Multinomial models
Latent model
---------------------------------------------------
Multinomial logit 2 educ | -.0443320 .0030007 -14.31 -0.043
Latent model
Conditional logit exper | -.1064632 .0096477 -10.43 -0.100
Hierarchy
---------------------------------------------------
Semiparametrics
3 educ | .0285944 .0040739 6.28 0.026
Censored data
exper | .1398895 .0107027 12.33 0.013
Final thoughts
---------------------------------------------------
Hierarchical models
Introduction
Nested logit model (yi = 0, 1, . . . , J ):
Estimation
Binary choice • split J choices into S groups G1 , . . . , GS

Density estimation • first hierarchy: yi ∈ Gs
Regression est.
hP iρs
Semiparametrics
αs exp(ρ −1 x⊤ β)
l∈Gs s ij
Discrete choice P (yi ∈ Gs |xi ) = P hP iρr
Introduction S −1 ⊤
Ordered models r=1 l∈Gs exp(ρ r xij β)
Example
Specification tests
Semiparametrics
Application
• second hierarchy: yi = j
Multinomial models
Latent model
exp(ρ−1 ⊤
s xij β)
Multinomial logit
Latent model
P (yi = j|yi ∈ Gs , xi ) = P −1 ⊤
Conditional logit l∈Gs exp(ρ s xil β)
Multinomial probit
⊲ Hierarchy
Semiparametrics • independence of irrelevant alternatives assumption needed only
Censored data within groups Gs
Final thoughts

Hierarchical models
Introduction
Estimation of the nested logit model:
Estimation
Binary choice
• normalization α1 = 1 required
Density estimation • other restrictions often imposed
Regression est. (e.g., α1 = . . . = αS or ρ1 = . . . = ρS )
Semiparametrics
• 1 − ρs represents correlation of unobservables across groups
Discrete choice
Introduction • limited-information likelihood
Ordered models
Example
Specification tests ◦ estimate λs = ρ−1
s β by conditional logit for each group of
Semiparametrics
Application
responses Gs , s = 1, . . . , S
Multinomial models
Latent model
◦ maximize multinomial choice of the group Gs
Multinomial logit
Latent model
Conditional logit
• full-information likelihood
Multinomial probit
⊲ Hierarchy ◦ maximize joint likelihood based on
Semiparametrics
Censored data
P (yi = j|yi ∈ Gs , xi ) · P (yi ∈ Gs |xi )
Final thoughts
• conditional logit: α1 = . . . = αS = ρ1 = . . . = ρS = 1 (test)
Introduction
Straightforward generalizations of the methods for binary responses;
Estimation
for example,
Binary choice
Density estimation • Maximum score estimator for multinomial choice (Fox, 2007)
Regression est.
Semiparametrics
◦ choice specific characteristics xij
Discrete choice ◦ consider choices yi = k and yi = l
Introduction
n
1X
Ordered models
Example
Specification tests β̂nM SE = arg maxβ I(yi = k)I(x⊤
ik β > x ⊤
il β)
Semiparametrics n
Application
i=1
Multinomial models
Latent model
+ I(yi = l)I(x⊤
ik β < x⊤
il β)
Multinomial logit
Latent model
Conditional logit
Multinomial probit
◦ many choices
J J n
Hierarchy
1 XXX
⊲ Semiparametrics
β̂nM SE = arg maxβ I(yi = k)I(x⊤
ik β > x ⊤
il β)
Censored data n
k=1 l=1 i=1
Final thoughts
+ I(yi = l)I(x⊤ ⊤
ik β < xil β)
Introduction
Estimation
Binary choice
Density estimation
Regression est.
Semiparametrics
Discrete choice
⊲ Censored data
Models for censored and
Introduction
Truncation
truncated data
Tobit model
MLE
Tobit – interpretation
Specification
Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
Application
Final thoughts

Censoring and corner solution responses
Introduction
Censored data
Estimation
Binary choice
• censored responses = some values are not observable; just a
Density estimation lower or upper bound is known
Regression est. (example: duration, income due to bracketing, social
Semiparametrics contributions, taxation rules, toxicity measurements)
Discrete choice
• corner solution responses = response distribution partially
Censored data
⊲ Introduction discrete due to censoring of latent response values
Truncation
(example: alcohol or charitable spending)
Tobit model
MLE
• estimation similar in both cases, although underlying reasons
Specification and models differ
Alternatives
Two-part Tobit • interpretation differs: for a variable yi with values observed only
Application
Alternatives above a (e.g., a = 0), we can typically be interested in
Sample selection
Two-step estimation
MLE
◦ both models: P (yi > a|xi ) or P (yi ≤ a|xi )
Example
Application
◦ censored model: E(yi |xi )
Final thoughts ◦ corner-solution response: E(yi |xi , yi > a)
Linear models?
Latent (uncensored) model: yi∗ = x⊤

Introduction
Estimation
i β + εi
censored at a from below/above
Binary choice
Density estimation Transformation to censoring from below at 0

Regression est.
Semiparametrics
• yi = max{yi∗ , a} ⇔ yi − a = max{yi∗ − a, 0}
Discrete choice • yi = min{yi∗ , a} ⇔ −yi = max{−yi∗ , −a}
Censored data • assume yi = max{yi∗ , 0} without loss of generality
⊲ Introduction
Truncation
Tobit model
MLE
Can we use linear model E(yi |xi ) = x⊤
i β?
Specification • E(yi |xi ) not linear in xi unless range of xi is very limited
Alternatives
Two-part Tobit
(under censoring from below, E(yi |xi ) > E(yi∗ |xi ) = x⊤ i β)
Application
Alternatives
• heteroscedasticity due to var(yi |xi ) (see the next slide)
Sample selection
Two-step estimation
• predictions not always positive
MLE
Example
• P (yi = 0|xi ) not predictable
Application
Final thoughts

Linear model – censored data?
Introduction
Linear fit of yi∗ = 0.5 + xi + εi and yi = max{0.5 + xi + εi , 0}
Estimation
for εi ∼ N (0, 1) and n = 1000
Binary choice
Density estimation
6
Regression est.
Semiparametrics
4
Discrete choice
Censored data
2
⊲ Introduction
Truncation
Tobit model
0
MLE
Specification
−2
Alternatives
Two-part Tobit
−4
Application
Alternatives
−4 −2 0 2 4
Sample selection x
Two-step estimation
MLE Original Censored
Example True line Linear prediction
Application
Final thoughts

Truncated data
Introduction
Censored data
Estimation
yi = max{0, x⊤
i β + εi }
Binary choice
Density estimation
• corner solution: x⊤
i β + εi is the unconstrained optimal choice
Regression est.
Semiparametrics
• censoring: x⊤ ∗
i β + εi represents the latent variable yi
Discrete choice
Censored data
Introduction Truncated data – “corner/censored” values are not observed
⊲ Truncation
Tobit model
MLE yi = x⊤
i β + εi
Specification
Alternatives
• both yi and xi observed only for yi > 0
Two-part Tobit
Application (example: truncated income data)
Alternatives
Sample selection • other thresholds and truncation from above possible (yi ≶ a)
Two-step estimation
MLE
Example
Application
Final thoughts

Truncated model – data
Introduction
Linear fit of yi∗ = 0.5 + xi + εi observable only for yi∗ > 0
Estimation
with εi ∼ N (0, 1) and n = 1000
Binary choice
Density estimation
6
Regression est.
Semiparametrics
4
Discrete choice
Censored data
2
Introduction
⊲ Truncation
Tobit model
0
MLE
−2
Specification
Alternatives
Two-part Tobit
−4
Application
Alternatives −4 −2 0 2 4
Sample selection x
Two-step estimation
MLE Original Truncated
Application
Final thoughts

Tobit model
Introduction
Tobit type I model
Estimation
Binary choice
yi = max{0, x⊤
i β + εi }
Density estimation
Regression est.
Semiparametrics • corner solution: x⊤

Discrete choice • censoring: x⊤ ∗
Censored data
Introduction
• Tobit: error distribution εi |xi ∼ N (0, σ 2 )
Truncation
⊲ Tobit model
⇒ probability of censoring P (yi = 0|xi ) > 0 for any xi
MLE
• E(xi x⊤ ) has a full rank, (y , x ) n random sample
Tobit – interpretation i i i i=1
Specification
Alternatives
Two-part Tobit Discussion
Application
Alternatives • maximum likelihood estimation
Sample selection
Two-step estimation • regression functions and marginal effects
MLE
Example • specification testing and extensions
Application
Final thoughts

Tobit model – data
Introduction
Linear fit of yi∗ = 0.5 + xi + εi and yi = max{0.5 + xi + εi , 0}
Estimation
for εi ∼ N (0, 1) and n = 1000
Binary choice
Density estimation
6
Regression est.
Semiparametrics
4
Discrete choice
Censored data
2
Introduction
Truncation
⊲ Tobit model
0
MLE
Specification
−2
Alternatives
Two-part Tobit
−4
Application
Alternatives
−4 −2 0 2 4
Sample selection x
Two-step estimation
MLE Original Censored
Application
Final thoughts

Tobit model – error distribution
Introduction
Error distribution from yi = max{0.5 + xi + εi , 0}
Estimation
for εi ∼ N (0, 1) and xi = 1: εi = (yi |xi = 1) − 0.5 − 1.0
Binary choice
Density estimation
.8
Regression est.
Semiparametrics
Discrete choice
.6
Censored data
Introduction
Density
Truncation
.4
⊲ Tobit model
MLE
Specification
.2
Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
0
Two-step estimation −5 0 5
MLE
Example Original Censored
Application
Final thoughts

Tobit model – distribution properties
Introduction
Error-term distribution
Estimation
Binary choice
• εi = yi − x⊤ ⊤
i β “observable” only for yi ≥ 0 ⇔ εi ≥ −xi β
Density estimation • εi censored from below at −x⊤
i β:
Regression est.
Semiparametrics
◦ positive probability at t = −x⊤
i β equal to
Discrete choice P (εi ≤ −x⊤ i β|x i ) = Φσ (−x ⊤ β) = Φ(−x⊤ β/σ)
i i
Censored data ◦ continuously distributed at t > −x⊤i β with
Introduction
Truncation density φσ (t) = φ(t/σ)/σ
⊲ Tobit model
MLE
Tobit – interpretation Tobit type I model yi = max{0, x⊤
i β + εi } – distribution properties
Specification
• P (yi = 0|xi ) = P (x⊤

Alternatives
Two-part Tobit i β + εi ≤ 0|xi )
Application
Alternatives
= P (x⊤i β/σ ≤ −ε i /σ|x i ) = 1 − Φ(x ⊤ β/σ) > 0
i
Sample selection
Two-step estimation
• P (yi > 0|xi ) = Φ(x⊤ i β/σ) > 0
MLE
Example • P (yi = c > 0|xi ) = P (x⊤
i β + εi = c > 0|xi ) = 0
Application
Final thoughts
• uncensored yi cont. distributed: φσ (x⊤
i β) = φ(x ⊤ β/σ)/σ
i
Tobit model – estimation
Introduction
Maximum likelihood estimation (yi∗ ∼ Fy )
Estimation
Binary choice • likelihood contribution for yi = 0:

Density estimation P (yi = 0|xi ) = 1 − Φ(x⊤ i β/σ)
Regression est. • likelihood contribution for yi > 0:
Semiparametrics
fy (yi |xi ) = fε (yi − x⊤ ⊤
i β|xi ) = φ{(yi − xi β)/σ}/σ
Discrete choice
• total likelihood contribution:
Censored data
Introduction
Truncation
l(yi , xi ; β) = I(yi = 0) · ln[1 − Φ(x⊤i β/σ)]
Tobit model
⊤

⊲ MLE
1 yi − xi β
+ I(yi > 0) · ln φ
Specification
Alternatives
σ σ
Two-part Tobit
Application
= I(yi = 0) · ln[1 − Φ(x⊤
i β/σ)]
Alternatives
⊤ 2

1 (yi − xi β)
Sample selection
− I(yi > 0) · ln(2πσ 2 ) +
Two-step estimation
MLE
2 2σ 2
Example
Application Pn
Final thoughts
• MLE: maximize i=1 l(yi , xi ; β)

Example: female labor supply
Introduction
Annual labor supply in hours for married women (Mroz, 1987)
Estimation
Reduced form equation using
Binary choice
Density estimation • hours can be zero (40% of data) or positive

Regression est. • wage not included due to endogeneity
Semiparametrics
• non-wife income
Discrete choice
Censored data
• education
Introduction
Truncation
• labor market experience
Tobit model
⊲ MLE
• age
Specification
• number of children
Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
Application
Final thoughts

Introduction
Estimation tobit hours nwifeinc educ exper expersq age

Binary choice kidslt6 kidsge6, ll(0)
Density estimation
Regression est. ---------------------------------------------------

Semiparametrics hours | Coef. Std. Err. t OLS
Discrete choice
----------+----------------------------------------
Censored data
Introduction
nwifeinc | -8.814243 4.459096 -1.98 [-3.45]
Truncation
Tobit model
educ | 80.64561 21.58322 3.74 [28.76]
⊲ MLE exper | 131.5643 17.27938 7.61 [65.67]
Specification expersq | -1.864158 .5376615 -3.47 [-0.70]
Alternatives
Two-part Tobit
age | -54.40501 7.418496 -7.33 [-30.5]
Application
Alternatives
kidslt6 | -894.0217 111.8779 -7.99 [-422.]
Sample selection kidsge6 | -16.218 38.64136 -0.42 [-32.8]
Two-step estimation
MLE _cons | 965.3053 446.4358 2.16 [1330.]
Example
Application
----------+----------------------------------------
Final thoughts /sigma | 1122.022 41.57903
Econometrics
---------------------------------------------------
Slide 190
Tobit model – conditional expectations
Tobit type I model yi = max{0, x⊤

Introduction
Estimation
i β + εi } – basic properties
Binary choice • E(yi |xi ) ≥ max{0, E(x⊤
i β + ε i |x i )} = max{0, x ⊤ β}
i
Density estimation (Jensen’s inequality; see figure on slide 196)
Regression est.
• med(yi |xi ) = max{0, med(x⊤ ⊤
i β + εi |xi )} = max{0, xi β}
Semiparametrics
Discrete choice
Further, all objects of interest are related by
Censored data
Introduction E(yi |xi ) = P (yi = 0|xi ) · 0 + P (yi > 0|xi ) · E(yi |xi , yi > 0)
Truncation
Tobit model
MLE
= P (yi > 0|xi ) · E(yi |xi , yi > 0)
⊲ Tobit – interpretation
Specification
Alternatives
Two-part Tobit
• note: probability P (yi > 0|xi ) can be expressed as in probit
Application
P (yi > 0|xi ) = P (x⊤

Alternatives
Sample selection i β + εi > 0|xi )
Two-step estimation
MLE = P (x⊤
i β/σ + ε i /σ > 0|x i ) = Φ(x ⊤
i β/σ)
Example
Application
Final thoughts
(the identication assumption of probit is varεi = 1)

Tobit model – conditional expectations
Introduction
Probability of no censoring
Estimation
Binary choice
P (yi > 0|xi ) = Φ(x⊤
i β/σ)
Density estimation
Regression est.
Expectation conditional on not being censored
Semiparametrics
Discrete choice φ(x⊤

i β/σ)
E(yi |xi , yi > 0) = x⊤
i β +σ
Censored data
Introduction
Φ(x⊤
i β/σ)
Truncation
Tobit model
MLE
(note: Mill’s ratio λ(t) = φ(t)/[1 − Φ(t)])
Specification Expectation for censored and uncensored observations
Alternatives
Two-part Tobit
Application E(yi |xi ) = P (yi > 0|xi ) · E(yi |xi , yi > 0)
Alternatives
⊤

Sample selection
φ(xi β/σ)
Two-step estimation = Φ(x⊤i β/σ) x ⊤
i β + σ
MLE
Example
Φ(x⊤ i β/σ)
Application
= Φ(x⊤ ⊤ ⊤
i β/σ)xi β + σφ(xi β/σ)
Final thoughts

Tobit model – marginal effects
Introduction
Marginal effects (estimated at x̄ or as average partial effects)
Estimation
Binary choice
• marginal effects for P (yi > 0|xi )
Density estimation
∂P (yi > 0|xi ) ∂Φ(x⊤
i β/σ) x⊤
i β βk
Regression est.
= =φ
Semiparametrics ∂xik ∂xik σ σ
Discrete choice
Censored data • marginal effects for E(yi |xi , yi > 0) = x⊤ ⊤

i β + σλ(−xi β/σ)
Introduction
Truncation
Tobit model ∂E(yi |xi , yi > 0) x⊤
i β x⊤
i β x⊤
i β
MLE = βk 1 − λ(− ) + λ(− )
⊲ Tobit – interpretation ∂xik σ σ σ
Specification
Alternatives
Two-part Tobit • marginal effects for
Application
Alternatives
Sample selection ∂E(yi |xi )
Two-step estimation = Φ(x⊤
i β/σ)βk
MLE ∂xik
Example
Application
Final thoughts

Introduction
Estimation tobit hours nwifeinc educ exper expersq age

Binary choice kidslt6 kidsge6, ll(0)
Density estimation
Regression est. ---------------------------------------------------

Semiparametrics hours | Coef. Std. Err. t OLS
Discrete choice
----------+----------------------------------------
Censored data
Introduction
nwifeinc | -8.814243 4.459096 -1.98 [-3.45]
Truncation
Tobit model
educ | 80.64561 21.58322 3.74 [28.76]
MLE exper | 131.5643 17.27938 7.61 [65.67]
Specification expersq | -1.864158 .5376615 -3.47 [-0.70]
Alternatives
Two-part Tobit
age | -54.40501 7.418496 -7.33 [-30.5]
Application
Alternatives
kidslt6 | -894.0217 111.8779 -7.99 [-422.]
Sample selection kidsge6 | -16.218 38.64136 -0.42 [-32.8]
Two-step estimation
MLE _cons | 965.3053 446.4358 2.16 [1330.]
Example
Application
----------+----------------------------------------
Final thoughts /sigma | 1122.022 41.57903
Econometrics
---------------------------------------------------
Slide 194
Introduction
Estimation Expression: Pr(hours>0) <=> predict(p(0,.))

Binary choice
Density estimation margins, predict(p(0,.)) dydx(*)

Regression est.
Discrete choice
| Delta-method
Censored data
Introduction
Truncation
Tobit model
----------+----------------------------------------
MLE nwifeinc | -.0024212 .0012202 -1.98 0.047
Specification educ | .022153 .0058285 3.80 0.000
Alternatives
Two-part Tobit
exper | .0361402 .0043438 8.32 0.000
Application
Alternatives
expersq | -.0005121 .0001444 -3.55 0.000
Sample selection age | -.0149448 .0019298 -7.74 0.000
Two-step estimation
MLE kidslt6 | -.2455841 .0282462 -8.69 0.000
Example
Application
kidsge6 | -.004455 .0106216 -0.42 0.675
Final thoughts ---------------------------------------------------
Introduction
Estimation Expression: E(hours|hours>0) <=> predict(e(0,.))

Binary choice
Density estimation margins, predict(e(0,.)) dydx(*)

Regression est.
Discrete choice
| Delta-method
Censored data
Introduction
Truncation
Tobit model
----------+----------------------------------------
MLE nwifeinc | -3.968784 2.007582 -1.98 0.048
Specification educ | 36.31225 9.703038 3.74 0.000
Alternatives
Two-part Tobit
exper | 59.23938 7.833684 7.56 0.000
Application
Alternatives
expersq | -.8393732 .2423184 -3.46 0.001
Sample selection age | -24.49691 3.362492 -7.29 0.000
Two-step estimation
MLE kidslt6 | -402.5507 50.74877 -7.93 0.000
Example
Application
kidsge6 | -7.302468 17.40427 -0.42 0.675
Final thoughts ---------------------------------------------------
Introduction
Estimation Expression: E(max{0,hours}) <=> predict(ystar(0,.))

Binary choice
Density estimation margins, predict(ystar(0,.)) dydx(*)

Regression est.
Discrete choice
| Delta-method
Censored data
Introduction
Truncation
Tobit model
----------+----------------------------------------
MLE nwifeinc | -5.188622 2.62141 -1.98 0.048
Specification educ | 47.47311 12.6214 3.76 0.000
Alternatives
Two-part Tobit
exper | 77.44708 9.997656 7.75 0.000
Application
Alternatives
expersq | -1.097361 .3155947 -3.48 0.001
Sample selection age | -32.02624 4.292112 -7.46 0.000
Two-step estimation
MLE kidslt6 | -526.2779 64.70622 -8.13 0.000
Example
Application
kidsge6 | -9.54694 22.75225 -0.42 0.675
Final thoughts ---------------------------------------------------
Specification testing and extensions
Introduction
Extensions
Estimation
Binary choice
• doubly censored/two-limit data
Density estimation
Regression est.
Specification testing
Semiparametrics
• heteroscedasticity and non-normality
Discrete choice
(similar to probit, extend specification or use Hausman test;
Censored data
Introduction
see the censored least absolute deviation)
Truncation
Tobit model
• two-part specification: what if decision P (yi > 0|xi ) is driven by
MLE
different factors than the average amount E(yi |xi )
⊲ Specification (example: spending on a particular charity, expats’ labour supply)
Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
Application
Final thoughts

Introduction
predict indt, xb; gen rest = hours - indt
Estimation
gen normd = normalden(rest / 1122.022) / 1122.022
Binary choice
kdens rest if indt>0 & rest>0, ci bw(sjpi) ll(0)
Density estimation
addplot((line normd rest if indt>0 & rest>0, sort))
Regression est.
Semiparametrics
.0008
Discrete choice
Censored data
Introduction .0006
Truncation
Density
Tobit model
.0004
MLE
⊲ Specification
.0002
Alternatives
Two-part Tobit
Application
0
Alternatives
Sample selection 0 1000 2000 3000 4000
rest
Two-step estimation
MLE 95% CI Kernel estimate
Example normd
Application
Final thoughts

Alternative methods
Introduction
• Symmetrically trimmed least squares (Powell, 1986)
Estimation
Binary choice ◦ data yi truncated from below at 0, mean x⊤

i β
Density estimation
◦ under conditional symmetry of ε|x ...
Regression est.
◦ ... truncate symmetrically from above at 2x⊤
i β
Semiparametrics
Discrete choice
◦ trimming works under censoring, but inefficiently
Censored data n
X
Introduction
Truncation
β̂ (ST LS) = arg minβ∈B {yi − max(x⊤
i β, yi /2)} 2
Tobit model i=1
MLE
Specification • Symmetrically censored least squares (Powell, 1986)
⊲ Alternatives
Two-part Tobit
Application ◦ under conditional symmetry of ε|x
Alternatives n
X
β̂ (SCLS) {yi − max(x⊤ 2
Sample selection
Two-step estimation = arg minβ∈B i β, yi /2)}
MLE
Example ni=1 o
2 ⊤ 2
Application
+I(yi > 2x⊤
i β) · (yi /2) − max(0, xi β)
Final thoughts

Alternative methods
Introduction
Censored least absolute deviation (CLAD) method (Powell, 1984)
Estimation
Binary choice
• med(yi |xi ) = max{0, med(x⊤ ⊤
i β + εi |xi )} = max{0, xi β}
Density estimation • assume med(εi |xi ) = 0 and minimize
Regression est.
n
X
Semiparametrics
Discrete choice
|yi − max{0, x⊤
i β}|
Censored data
i=1
Introduction
Truncation
√
Tobit model
• n-consistent and asymptotically normal estimator
MLE
• uses only observations with x⊤i β > 0;
Specification
full-rank assumption for E(xi x⊤
i |x ⊤ β > 0)
i
⊲ Alternatives
Two-part Tobit (just as for STLS / SCLS; )
Application
Alternatives • “only” med(yi |xi ) identified
Sample selection
Two-step estimation (STLS / SCLS identify “only” E(yi |xi ))
MLE
Example • poor performance in small or heavily censored samples
Application
Final thoughts

Introduction
Estimation clad hours nwifeinc educ exper expersq age

Binary choice kidslt6 kidsge6
Density estimation
Regression est. ---------------------------------------------------

Semiparametrics hours | Observed Bias Std. Err. Tobit
Discrete choice
----------+----------------------------------------
Censored data
Introduction
nwifeinc | -5.99571 -.972583 5.562 [-8.81]
Truncation
Tobit model
educ | 73.87262 -5.683093 32.531 [80.65]
MLE exper | 115.80310 6.880405 23.476 [131.6]
Specification expersq | -1.32852 -.160859 0.745 [-1.86]
⊲ Alternatives
Two-part Tobit
age | -57.43443 -2.804438 8.015 [-54.4]
Application
Alternatives
kidslt6 |-1057.3100 -49.193320 209.544 [-894.]
Sample selection kidsge6 |-104.01270 -15.718030 49.884 [-16.2]
Two-step estimation
MLE _cons |1498.64600 155.657400 528.073 [965.3]
Example
Application
---------------------------------------------------
Final thoughts Hausman test possible?
Two-part Tobit model
Introduction
Recall the Tobit type I model
Estimation
Binary choice
yi = max{0, x⊤
i β + εi }
Density estimation
Regression est.
Semiparametrics • corner solution: x⊤

Discrete choice • censoring: x⊤ ∗
Censored data
Introduction
• under error distribution εi |xi ∼ N (0, σ 2 ),
Truncation
Tobit model
MLE
◦ P (yi > 0|xi ) = P (x⊤
i β + ε i > 0|x i ) = Φ(x ⊤ β/σ)
i
Specification
◦ E(yi |xi , yi > 0) = x⊤
i β + σλ(−x ⊤ β/σ)
i
Alternatives
⊲ Two-part Tobit
Application
Alternatives Regression parameters β
Sample selection
Two-step estimation • determine the influence of xi on the “decision” yi > 0
MLE
Example • determine the influence of xi on the amount yi if yi > 0
Application
Final thoughts

Two-part Tobit model
Introduction
Two-part Tobit model (also hurdle model):
Estimation
model the following two decisions separately
Binary choice
Density estimation • participation decision (yi = 0 vs. yi > 0)

Regression est. • amount decision (the magnitude of yi if yi > 0)
Semiparametrics
• two decisions can be related in general (see Tobit type II)
Discrete choice
Censored data
• assume independence of the two decisions for now:
Introduction yi = si qi , where si = I(yi > 0), and
Truncation
Tobit model
MLE
Fq (qi |xi , si ) = Fq (qi |xi )
Specification
Alternatives
⊲ Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
Application
Final thoughts

Application
Introduction
Melenberg and van Soest (1996) Modelling of vacation expenditures.
Estimation
Journal of Applied Econometrics 11, 59–76
Binary choice
Density estimation • analyze vacation spending of households

Regression est. • homoscedastic and heteroscedastic Tobit compared with CLAD
Semiparametrics
• two-part model tested and independence of decision and
Discrete choice
spending found
Censored data
Introduction
Truncation
Tobit model
MLE
Specification
Alternatives
Two-part Tobit
⊲ Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
Application
Final thoughts

Truncated normal hurdle model
Introduction
Truncated normal hurdle model (Cragg, 1971)
Estimation
Binary choice
• decision – model by probit:
Density estimation P (si = 1|xi ) = P (yi > 0|xi ) = Φ(x⊤
i γ)
Regression est. • amount – model using yi = qi ∼ N (0, σ 2 ) truncated at 0
Semiparametrics
Discrete choice
f (yi |xi ) = f (yi |xi , yi > 0) · P (yi > 0|xi )
Censored data φ{(yi − x⊤ i β)/σ}/σ
Introduction f (yi |xi , yi > 0) =
Truncation Φ{x⊤ i β/σ}
Tobit model
MLE
Tobit – interpretation • likelihood contribution for the full-information MLE:
Specification
Alternatives l(yi , xi , β) = I(yi = 0) · ln[1 − Φ(x⊤
i γ)]
Two-part Tobit
⊲ Application + I(yi > 0) · ln[Φ(x⊤
i γ)]
Alternatives
Sample selection + I(yi > 0) · ln[φ{(yi − x⊤
i β)/σ}/σ]
Two-step estimation
MLE
Example
− I(yi > 0) · ln[Φ(x⊤
i β/σ)]
Application
Final thoughts • reduces to the standard Tobit for γ = β/σ

Introduction
Estimation craggit inlf nwifeinc ... kidsge6,

Binary choice second(hours nwifeinc ... kidsge6) ll(0)
Density estimation
Regression est. Tobit/s Probit Truncated/s

Semiparametrics --------------------------------------------------
Discrete choice
inlf | Coef. SE | Coef. SE | Coef. SE
Censored data
Introduction
---------+-------------+-------------+------------
Truncation
Tobit model
nwifeinc | -.008 .004 | -.012 .005 | -.000 .005
MLE educ | .072 .019 | .131 .025 | -.035 .027
Specification exper | .117 .015 | .123 .019 | .085 .026
Alternatives
Two-part Tobit
expersq | -.002 .000 | -.002 .001 | -.001 .001
⊲ Application
Alternatives
age | -.048 .007 | -.052 .008 | -.032 .010
Sample selection kidslt6 | -.797 .099 | -.868 .119 | -.570 .181
Two-step estimation
MLE kidsge6 | -.014 .034 | .036 .043 | .121 .051
Example
Application
_cons | .860 .398 | .270 .508 | 2.49 .569
Final thoughts ---------+-------------+-------------+------------
Censored regression model yi = max{0, x⊤

Introduction
Estimation
i β + εi }
Binary choice • single-index models applicable (semiparametric LS)
Density estimation
• conditional median assumption med(εi |xi ) = 0
Regression est.
• least absolute deviation regression (LAD; biased)
Semiparametrics
Discrete choice n
X
Censored data minp |yi − x⊤
i β|
Introduction β∈R
Truncation i=1
Tobit model
MLE
• censored absolute deviation regression (CLAD)
Specification
Alternatives n
X
|yi − max{0, x⊤
Two-part Tobit
Application minp i β}|
⊲ Alternatives β∈R
i=1
Sample selection
Two-step estimation
MLE skewed in small samples, no two-part model
Example
Application
Final thoughts

Two-step estimation of censored regression
Khan and Powell (2001) for yi = max{0, x⊤

Introduction
Estimation
i β + εi }
Binary choice n
X n
X
Density estimation minp |yi − x⊤
i β| vs. minp |yi − max{0, x⊤
i β}|
Regression est.
β∈R β∈R
i=1 i=1
Semiparametrics
Discrete choice
Censored data
• observation: criteria of QR and CQR equivalent for x⊤
i β >0
Introduction
Truncation
• observation: med(yi |xi ) = x⊤
i β if P (yi > 0|xi ) > 0.5
Tobit model ⇒ estimate nonparametrically (by Nadaraya-Watson estimator)
MLE
Tobit – interpretation p(xi ) = P (yi > 0|xi ) = E[I(yi > 0)|xi )]
Specification
Alternatives • observation: med(yi |xi ) = x⊤
i β if med(y i |x i ) = x ⊤β > 0
i
Two-part Tobit
Application ⇒ estimate nonparametrically q(xi ) = med(yi |xi )“ = a(x)”
⊲ Alternatives
Sample selection
(local median regression)
Two-step estimation
MLE n
X
Example
Application
minp |yi − a(x) − b(x)(xi − x)|K{Hn−1 (xi − x)}
β∈R
Final thoughts i=1

Two-step estimation of censored regression
Introduction
Two-step estimation of censored regression model
Estimation
Binary choice
yi = max{0, x⊤
i β + εi }
Density estimation
Regression est.
Semiparametrics • first step: estimate nonparametrically p̂n (xi ) or q̂n (xi )

Discrete choice • second step:
Censored data
Introduction
Truncation
◦ select observations with p̂n (xi ) > 0.5 or q̂n (xi ) > 0
Tobit model
MLE
◦ estimate β by the linear quantile regression estimator applied
Tobit – interpretation only to the selected observations
Specification
Alternatives
Two-part Tobit • asymptotic distribution of the estimator is equivalent to the
Application
⊲ Alternatives “oracle” estimator = quantile regression estimator applied to data
Sample selection
Two-step estimation
with med(yi |xi ) > 0; the two-step estimator is adaptive
MLE
Example
Application
Final thoughts

Introduction
sml inlf nwifeinc educ expersq age kidslt6 kidsge6, offset(exper)
Estimation
predict indsml, xb
Binary choice
lpoly inlf indsml, generate(indlp plp)
Density estimation
Regression est.
Semiparametrics
1
Discrete choice
.8
Censored data Y=1 if in lab frce, 1975
Introduction
Truncation
.6
Tobit model
MLE
.4
Specification
Alternatives
.2
Two-part Tobit
Application
0
⊲ Alternatives
Sample selection −80 −60 −40 −20 0 20
Linear prediction
Two-step estimation kernel = epanechnikov, degree = 0, bandwidth = 5.32
MLE
Example
Application
Final thoughts

Example: female labor supply – second step
Introduction
Estimation lpoly inlf indsml, generate(indlp plp)

Binary choice qreg hours nwifeinc educ exper expersq
Density estimation age kidslt6 kidsge6 if plp>0.5, quantile(50)
Regression est.
Semiparametrics Testing? Tobit Two-part T. Khan &Powell

Discrete choice
--------------------------------------------------
Censored data
Introduction
inlf | Coef. SE | Coef. SE | Coef. SE
Truncation
Tobit model
---------+-------------+-------------+------------
MLE nwifeinc | -8.81 4.46 | .153 5.16 | -1.18 4.52
Specification educ | 80.6 21.6 | -29.9 22.8 | 21.9 23.2
Alternatives
Two-part Tobit
exper | 131. 17.3 | 72.6 21.2 | 37.5 17.9
Application
⊲ Alternatives
expersq | -1.86 .537 | -.944 .609 | .750 .591
Sample selection age | -54.4 7.42 | -27.4 8.29 | -26.1 7.85
Two-step estimation
MLE kidslt6 | -894. 112. | -485. 154. | -374. 104.
Example
Application
kidsge6 | -16.2 38.6 | -103. 43.5 | -30.0 41.7
Final thoughts _cons | 965. 446. | 2124. 483. | 1022. 487.
Econometrics
---------+-------------+-------------+------------
Slide 212
Tobit type II – sample selection model
Introduction
Tobit type II – incidental truncation
Estimation
Binary choice
y1i = x⊤
1i β1 + ε1i
Density estimation
Regression est.
y2i = I(x⊤
2i β2 + ε2i > 0)
Semiparametrics
Discrete choice • (Two-part) Tobit if y1 ≡ y2 , x1 ≡ x2 , ε1 ≡ ε2 (and β1 ≡ β2 )

Censored data
Introduction • (x2i , y2 ) are always observed
Truncation
Tobit model • (x1i , y1i ) are observed only if y2i = 1
MLE
Tobit – interpretation • (ε1i , ε2i ) is independent of (x1i , x2i ) and has zero mean
Specification
Alternatives • ε2i ∼ N (0, 1) and E(ε1i |ε2i ) = ρε2i
Two-part Tobit
Application
(⇒ E(ε1i ε2i ) = E[E(ε1i |ε2i )ε2i ] = ρ)
Alternatives
⊲ Sample selection
Two-step estimation Then
MLE
Example
Application E(y1i |x1i , x2i , ε2i ) = x⊤
1i β 1 + E(ε 1i |x 1i , x 2i , ε 2i ) = x ⊤
1i β1 + ρε2i
Final thoughts
E(y1i |x1i , x2i , y2i ) = x⊤
1i β1 + ρE(ε2i |x1i , x2i , y2i )
Tobit type II – sample selection model
Introduction
Since in the model
Estimation
Binary choice
y1i = x⊤
1i β1 + ε1i
Density estimation
Regression est.
y2i = I(x⊤
2i β2 + ε2i > 0),
Semiparametrics
only data with y2i = 1 are observed and ε2i ∼ N (0, 1), then
Discrete choice
Censored data
Introduction E(y1i |x1i , x2i , y2i = 1) =
Truncation
Tobit model = x⊤
1i β1 + ρE(ε2i |x1i , x2i , y2i = 1)
MLE
Tobit – interpretation = x⊤
1i β 1 + ρE(ε 2i |x 1i , x 2i , x ⊤
2i β2 + ε2i > 0)
Specification
Alternatives = x⊤
1i β 1 + ρE(ε 2i |x 1i , x 2i , ε 2i > −x ⊤
2i β2 )
Two-part Tobit
Application
⊤ φ(x⊤ 2i β2 ) ⊤ ⊤
Alternatives = x1i β1 + ρ ⊤
= x 1i β 1 + ρλ(−x 2i β2 ),
⊲ Sample selection
Two-step estimation
Φ(x2i β2 )
MLE
Example
where λ(−t) = φ(t)/Φ(t) is the inverse Mills ratio
Application
Final thoughts

Two-step estimation
Introduction
Heckman (1976): two-step procedure similar to Tobit type II
Estimation
Binary choice
• estimate binary-choice model (probit) using all data
Density estimation
Regression est.
P (y2i = 1|x2i ) = Φ(x⊤
2i β2 )
Semiparametrics
Discrete choice • obtain “Heckman’s lambda’s” λ̂i = λ(−x⊤

2i b2n ), i = 1, . . . , n
Censored data • estimate the linear model for β1 and ρ using data with y2i = 1
Introduction
Truncation
Tobit model
MLE
y1i = x⊤
1i β1 + ρλ̂i + {ε1i − E(ε1i |y2i = 1)}
Specification
Alternatives
Inference
Two-part Tobit
Application • test H0 : ρ = 0 for selection bias (variance of β̂1n and ρ̂n simple
Alternatives
Sample selection
to estimate under H0 as var(ε1i |x1i , y2i = 1) = var(ε1i |x1i ))
⊲ Two-step estimation
MLE
• variance estimation difficult if ρ 6= 0 due to λ̂i and selection
Example
Application
• x1i ≡ x2i possible, but identification only via nonlinearity of λ(·)
Final thoughts (danger of collinearity if not enough variation in x1i !)
MLE estimation
Introduction
Estimation
X
2
Binary choice
log Ln (β, σ11 , σ12 ) = log P (y2i = 0|x2i )
Density estimation
y2i =0
Regression est. X
Semiparametrics
+ log[P (y2i = 1|x2i ) · f (y1i |y2i = 1, x1i , x2i )]
Discrete choice
y2i =1
Censored data
Introduction
Truncation
• MLE estimation possible using the Bayes rule
Tobit model
MLE f (y1i |...)
Tobit – interpretation f (y1i |y2i = 1, ...) = P (y2i = 1|y1i , ...) ·
Specification P (y2i = 1|...)
Alternatives
Two-part Tobit
Application which implies
Alternatives
Sample selection
Two-step estimation P (y2i = 1|...)f (y1i|y2i = 1, ...) = P (y2i = 1|y1i , ...)f (y1i|...)
⊲ MLE
Example
Application
Final thoughts

MLE estimation
Introduction
Estimation
X
2
Binary choice
log Ln (β, σ11 , σ12 ) = log P (y2i = 0|x2i )
Density estimation
y2i =0
Regression est. X
Semiparametrics
+ log[P (y2i = 1|y1i , x1i , x2i ) · f (y1i |x1i , x2i )]
Discrete choice
y2i =1
Censored data
Introduction
Truncation
• conditional distribution of y2i = x⊤ ⊤
2i β2 + ε2i |ε1i = y1i − x1i β1
Tobit model follows from the joint normality of (ε1i , ε2i ):
MLE
−2
Specification
Alternatives
ε2i |ε1i = y1i − x⊤ ⊤
1i β1 ∼ N (µ1 − σ12 σ11 (y1i − x1i β1 − µ2 ),
−1
Two-part Tobit
Application
σ22 − σ12 σ22 σ21 )
Alternatives
Sample selection
Two-step estimation
where σ22 = 1 and
⊲ MLE
Example ε1i µ1 σ11 σ12 σ11 σ12
Application ∼N , = N 0,
ε2i µ2 σ21 σ22 σ21 1
Final thoughts

MLE estimation
Introduction
Estimation

Binary choice 1 y1i −x⊤
1i β1
f (y1i |x1i , x2i ) = φ
Density estimation σ11 σ11
Regression est.
P (y2i = 0|x1i , x2i ) = 1 − Φ(x⊤ 2i β2 )
Semiparametrics  
Discrete choice  x⊤ β + σ σ −1 (y − x⊤ β ) 
2i 2 q12 11 i1 i1 1
Censored data P (y2i = 1|y1i , x1i , x2i ) = Φ
 2 σ −2 
Introduction
Truncation
1 − σ12 11
Tobit model
MLE 2 σ −2 )
• log-likelihood contribution (denoting σc2 = 1 − σ12
Tobit – interpretation 11
Specification
Alternatives
Two-part Tobit li (β, σ11 , σ12 ) = (1 − y2i ) log{1 − Φ(x⊤ 2i β2 )}
Application ⊤ −1 ⊤

Alternatives x2i β2 + σ12 σ11 (yi1 − xi1 β1 )
Sample selection + y2i log Φ 2 σ −2 )1/2
Two-step estimation (1 − σ12 11
⊲ MLE
Example y1i − x⊤ 1i β1
Application + log φ − log(σ11 )
Final thoughts
σ11
Example: female wage equation
Introduction
Married women labor force participation (Mroz, 1987):
Estimation
Binary choice
• wage offer = observed only for those who choose to work
Density estimation • explanatory variables for the wage equation
Regression est.
Semiparametrics
◦ education
Discrete choice ◦ labor market experience
Censored data
Introduction • additional explanatory variables for the participation equation
Truncation
Tobit model
MLE ◦ non-wife income
Specification ◦ age
Alternatives
Two-part Tobit ◦ number of children
Application
Alternatives
Sample selection
Two-step estimation
MLE
⊲ Example
Application
Final thoughts

Introduction
Estimation Heckman selection model -- two-step estimates

Binary choice
Density estimation Probit selection equation

Regression est. ---------------------------------------------------
Semiparametrics | Coef. Std. Err. z P>|z|
Discrete choice
----------+----------------------------------------
Censored data
Introduction
inlf |
Truncation
Tobit model
nwifeinc | -.0120237 .0048398 -2.48 0.013
MLE age | -.0528527 .0084772 -6.23 0.000
Specification educ | .1309047 .0252542 5.18 0.000
Alternatives
Two-part Tobit
exper | .1233476 .0187164 6.59 0.000
Application
Alternatives
expersq | -.0018871 .0006 -3.15 0.002
Sample selection kidslt6 | -.8683285 .1185223 -7.33 0.000
Two-step estimation
MLE kidsge6 | .036005 .0434768 0.83 0.408
⊲ Example
Application
_cons | .2700768 .508593 0.53 0.595
Final thoughts ---------------------------------------------------
Introduction
Estimation Heckman selection model -- two-step estimates

Binary choice ---------------------------------------------------
Density estimation | Coef. Std. Err. z [OLS]
Regression est. ----------+----------------------------------------
Semiparametrics lwage |
Discrete choice
educ | .1090655 .015523 7.03 .107
Censored data
Introduction
exper | .0438873 .0162611 2.70 .041
Truncation
Tobit model
expersq | -.0008591 .0004389 -1.96 -.001
MLE _cons | -.5781032 .3050062 -1.90 -.552
Specification ----------+----------------------------------------
Alternatives
Two-part Tobit
mills |
Application
Alternatives
lambda | .0322619 .1336246 0.24 0.809
Sample selection ----------+----------------------------------------
Two-step estimation
MLE rho | 0.04861
⊲ Example
Application
sigma | .66362875
Final thoughts ----------+----------------------------------------
Introduction
Estimation Heckman selection model -- MLE estimates

Binary choice ---------------------------------------------------
Density estimation | Coef. Std. Err. z [OLS]
Regression est. ----------+----------------------------------------
Semiparametrics lwage |
Discrete choice
educ | .1083502 .0148607 7.29 .107
Censored data
Introduction
exper | .0428369 .0148785 2.88 .041
Truncation
Tobit model
expersq | -.0008374 .0004175 -2.01 -.001
MLE _cons | -.5526973 .2603784 -2.12 -.552
Specification ----------+----------------------------------------
Alternatives
Two-part Tobit
rho | .0266078 .1470778
Application
Alternatives
sigma | .6633975 .0227075
Sample selection lambda | .0176515 .0976057
Two-step estimation
MLE ---------------------------------------------------
⊲ Example
Application
Final thoughts

Application
Introduction
Buchinsky (1998) The dynamics of changes in the female wage
Estimation
distribution in the USA: a quantile regression approach. Journal of
Binary choice
Applied Econometrics 13, 1–30.
Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data
Introduction
Truncation
Tobit model
MLE
Specification
Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
⊲ Application
Final thoughts

Introduction
Estimation
Binary choice
Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data Final thoughts

⊲ Final thoughts
MLE
Semiparametrics
Microeconomic data
The end

Introduction
Maximum likelihood
Estimation
Binary choice
• applicable in many nonlinear models
Density estimation • distributional assumptions necessary
Regression est.
Semiparametrics Examples of MLE – discrete-choice responses

Discrete choice
• probit/logit, ordered probit/logit, multinomial probit/logit
Censored data
Final thoughts
• count data
⊲ MLE
Semiparametrics Examples of MLE – partially discrete, partially continuous responses
Microeconomic data
The end
• Tobit, two-part Tobit, Tobit type II (sample selection)
• models with random censoring
Applications
• extensions needed (non-constant thresholds, random censoring,
random coefficients, sample selection probit, endogeneity, ...)

Nonparametric estimation
Introduction
Nonparametric estimation
Estimation
Binary choice
• very flexible, minimal assumptions
Density estimation • not easily applicable directly in models with several/many
Regression est. explanatory variables
Semiparametrics
Discrete choice Semiparametric estimation

Censored data
• combine benefits of nonparametric and parametric methods
Final thoughts
MLE
⊲ Semiparametrics
Microeconomic data
Applications
The end
• testing of distributional or functional assumptions
• estimation with relaxed assumptions, where suitable
• nonparametric identification

Microeconomic data
Introduction
Choices such as
Estimation
Binary choice
• structural versus reduced-form analysis
Density estimation • parametric versus semiparametric estimation
Regression est.
• approach to semiparametric estimation
Semiparametrics
• software package and numerical tools
Discrete choice
Censored data should primarily depend on

Final thoughts
MLE • purpose of the analysis
Semiparametrics
⊲ Microeconomic data • availability of data
The end
• characteristics of data

The end
Introduction
Estimation
Binary choice
Density estimation
Regression est.
Semiparametrics
Discrete choice
Censored data
Final thoughts
MLE
Semiparametrics
Microeconomic data
⊲ The end

Microeconometrics

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Microeconometrics

Uploaded by

Copyright:

Available Formats

MicroEconometrics

• Classes: 14 lectures + 4 assignments

Density estimation Methodology

Regression est. ◦ including binary-choice models

Discrete choice • discrete and limited responses models

Density estimation • basic assumption E(εi |xi ) = 0

Final thoughts Without further structure, it is possible to study E(yi |xi ) = x⊤

Regression est. yi = Ai · liα · kiβ

• reduced form: ln yi = ln A + α ln li + β ln ki + εi , where εi

◦ captures measurement errors

Is this the only possible reduced form?

Density estimation • account for heterogeneity

Binary choice Parametric estimation methods

• k × k weighting matrix Wn , possibly random (estimated)

(densities of N (0, σ 2 ) and N (0, 1) are denoted φσ (·) and φ(·))

• maximizing this objective function is equivalent to least squares

Discrete choice Assumptions can be

◦ have a form of moment conditions

• solving this equation is equivalent to least squares estimator

• the Laplace density function of yi |xi ∼ DExp(x⊤

Discrete choice • the first-order conditions (defining quasi-MLE)

• this objective function defines least absolute deviation estimator

Linear regression model: yi = x⊤

and is consistent if Qτ (εi |xi ) = 0

Semiparametrics Normal data with quantiles Heteroscedastic data with quantiles

Estimation foreach q of numlist 0.25 0.5 0.75 {

Final thoughts tau= 0.25 tau= 0.50 tau= 0.75

1000 2000 3000 4000 5000

If in the linear regression model yi = x⊤

Regression est. Factors influencing the decision are available

Binary choice • two choices a and b with choice characteristics za and zb

• choice b: utility Ub = w ⊤ δb + zb⊤ γb + ǫb

Density estimation E(yi |xi ) = 1 · P (yi = 1|xi ) + 0 · P (yi = 0|xi )

Censored data [ = P (−x⊤

Binary-choice model P (yi = 1|xi ) = F (x⊤

MLE for probit and logit: P (yi = 1|xi ) = F (x⊤

Discrete choice Evidence of coronary heart disease (1=yes, 0=no) Probit

Estimation probit chd age

Estimation probit chd age;margins, dydx(age) at(age=(20 40 60))

Estimation . estat classification

Density estimation • estimation under heteroscedasticity: var(εi |xi0 ) 6= const. σ 2

Estimation set obs 200 | n <- 200

Regression est. y Fitted values

Estimation set obs 200 | n <- 200

Regression est. y Pr(y)

Binary-choice model if εi |xi ∼ N (0, exp(x⊤

Estimation probit chd age

Semiparametrics • slow convergence rate n1/3

to get β̂n and then estimate F̂n (x⊤

Estimation probit <- function(beta, x, y)

Semiparametrics z <- optim(double(p+1), probit, x=X, y=Y,

Estimation program define probit

Semiparametrics ml model lf probit (y = x1 x2)

Estimation KS <- function(beta, x, y)

Estimation program define KS

Regression est. ml model lf KS (y = x1 x2, noconst offset(x1))

Direct estimation of single index model E(Yi |Xi ) = g(Xi⊤ β)

Regression est. • Requires estimation of densities, distribution functions,

Evidence of coronary heart disease (1=yes, 0=no)

Klein and Spady

Binary choice ◦ concentrates on estimating E(Y |X)

Binary choice • select origin x0 and bin width h

Density estimation h=0.1 h=0.5