Professional Documents
Culture Documents
The Identification Zoo: Meanings of Identification in Econometrics
The Identification Zoo: Meanings of Identification in Econometrics
https://doi.org/10.1257/jel.20181361
Over two dozen different terms for identification appear in the econometrics liter-
ature, including set identification, causal identification, local identification, generic
identification, weak identification, identification at infinity, and many more. This sur-
vey: (i) gives a new framework unifying existing definitions of point identification;
(ii) summarizes and compares the zooful of different terms associated with identifica-
tion that appear in the literature; and (iii) discusses concepts closely related to identifi-
cation, such as normalizations and the differences in identification between structural
models and causal, reduced form models. ( JEL C01, C20, C50)
835
836 Journal of Economic Literature, Vol. LVII (December 2019)
The many terms for identification that so recognizing lack of identification, and
appear in the econometrics literature include searching for restrictions that suffice to attain
(in alphabetical order): Bayesian identifi- identification, are fundamentally important
cation, causal identification, essential iden- problems in econometric modeling.
tification, eventual identification, exact The next section, section 2, begins by pro-
identification, first order identification, fre- viding some historical background. The basic
quentist identification, generic identification, notion of identification (uniquely recovering
global identification, identification arrange- model parameters from the observable popu-
ment, identification at infinity, identification lation), is now known as “point identification.”
by construction, identification of bounds, Section 3 summarizes the basic idea of point
ill-posed identification, irregular identi- identification. A few somewhat different char-
fication, local identification, nearly weak acterizations of point identification appear in
identification, nonparametric identifica- the literature, varying in what is assumed to be
tion, non-robust identification, nonstandard observable and in the nature of the parameters
weak identification, overidentification, para- to be identified. In section 3 (and in an appen-
metric identification, partial identification, dix), this survey proposes a new definition of
point identification, sampling identification, point identification (and of related concepts
semiparametric identification, semi-strong like structures and observational equivalence)
identification, set identification, strong iden- that encompasses these alternative character-
tification, structural identification, thin set izations or classes of point-identified models
identification, underidentification, and weak that currently appear in the literature.
identification. This survey gives the meaning Section 3 then provides examples of, and
of each and shows how they relate to each methods for obtaining, point identification.
other. This section also includes a discussion of typi-
Let θ denote an unknown parameter or a cal sources of non-identification, and of some
set of unknown parameters (vectors and/or traditional identification-related concepts
functions) that we would like to learn about, like overidentification, exact identification,
and ideally, estimate. Examples of what and rank and order conditions. Identification
θ could include are objects like regressor by functional form is described and examples
coefficients, or average treatment effects, are provided, including constructed instru-
or error distributions. Identification deals ments based on second and higher moment
with characterizing what could potentially or assumptions. Appropriate use of such meth-
conceivably be learned about parameters θ ods is discussed.
from observable data. Roughly, identification Next is section 4, which defines and dis-
asks, if we knew the population that data are cusses the concepts of coherence and com-
drawn from, would θ be known? And if not, pleteness of models. These are closely
what could be learned about θ? associated with existence of a reduced form,
The study of identification logically pre- which in turn is often used as a starting point
cedes estimation, inference, and testing. for proving identification. This is followed
For θ to be identified, alternative values of by section 5, which is devoted to discussing
θmust imply different distributions of the identification concepts in what is variously
observable data (see, e.g., Matzkin 2012). known as the reduced form, or program
This implies that if θis not identified, then evaluation, or treatment effects, or causal
we cannot hope to find a consistent estimator inference literature. This literature places a
for θ. More generally, identification failures particular emphasis on randomization, and is
complicate statistical analyses of models, devoted to the identification of parameters
Lewbel: The Identification Zoo 837
In addition to two different Wrights, two Jerzy Splawa-Neyman (1923)4, David R. Cox
different Workings also published early (1958), and Donald B. Rubin (1974), among
papers relating to the subject: Holbrook many others. Pearl (2015) and Heckman and
Working (1925) and, more relevantly, Elmer Pinto (2015) credit Haavelmo (1943) as the
J. Working (1927). Both wrote about statis- first rigorous treatment of causality in the
tical demand curves, though Holbrook is context of structural econometric models.
the one for whom the Working-Leser Engel Unlike the results in the statistics literature,
curve is named. econometricians historically focused more
Jan Tinbergen (1930) proposed indi- on cases where selection (determining who
rect least squares estimation (numerically is treated or observed) and outcomes may be
recovering structural parameters from lin- correlated. These correlations could come
ear regression reduced form estimates), from a variety of sources, such as simulta-
but does not appear to have recognized neity as in Haavelmo (1943), or optimizing
its usefulness for solving the identification self selection as in Andrew D. Roy (1951).
problem. Another example is Wald’s (1943) survivor-
The above examples, along with the later ship bias analysis (regarding airplanes in
analyses of Trygve Haavelmo (1943), Tjalling world war II), which recognizes that even
Koopmans (1949), Theodore W. Anderson when treatment assignment (where a plane
and Herman Rubin (1949), Koopmans was hit) is random, sample attrition that is
and Olav Reiersøl (1950), Leonid Hurwicz correlated with outcomes (only planes that
(1950), Koopmans, Rubin, and Roy B. survived attack could be observed) drasti-
Leipnik (1950), and the work of the Cowles cally affects the correct analysis. General
Foundation more generally, are concerned models where selection and outcomes are
with identification arising from simultane- correlated follow from James J. Heckman
ity in supply and demand. Other import- (1978). Causal diagrams (invented by Sewall
ant early work on this problem includes Wright as discussed above) were promoted
Abraham Wald (1950), Henri Theil (1953), by Judea Pearl (1988) to model the connec-
J. Denis Sargan (1958), and results sum- tions between treatments and outcomes.
marized and extended in Franklin Fisher’s A different identification problem is that
(1966) book. Most of this work emphasizes of identifying the true coefficient in a linear
exclusion restrictions for solving identifica- regression when regressors are measured
tion in simultaneous systems, but identifi- with error. Robert J. Adcock (1877, 1878),
cation could also come from restrictions and Charles H. Kummell (1879) considered
on the covariance matrix of error terms, measurement errors in a Deming regres-
or combinations of the two, as in Karl G. sion, as popularized in W. Edwards Deming
Jöreskog (1970). Milton Friedman’s (1953) (1943)5. This is a regression that minimizes
essay on positive economics includes a the sum of squares of errors measured per-
critique of the Cowles foundation work, pendicular to the fitted line. Corrado Gini
essentially warning against using different (1921) gave an example of an estimator that
criteria to select models versus criteria to deals with measurement errors in standard
identify them.
A standard identification problem in the 4 Neyman’s birth name was Splawa-Neyman, and
statistics literature is that of recovering a he published a few of his early papers under that name,
treatment effect. Derived from earlier proba- including this one.
5 Adcock’s publications give his name as R. J. Adcock. I
bility theory, identification based on random- only have circumstantial evidence that his name was actu-
ization was developed in this literature by ally Robert.
Lewbel: The Identification Zoo 839
linear regression, but Ragnar A. K. Frisch from data. Think of ϕas everything that could
(1934) was the first to discuss the issue in a be learned about the population that data
way that would now be recognized as iden- are drawn from. Usually, ϕwould either be a
tification. Other early papers looking at distribution function, or some features of dis-
measurement errors in regression include tributions like conditional means, quantiles,
Neyman (1937), Wald (1940), Koopmans autocovariances, or regression coefficients.
(1937), Reiersøl (1945, 1950), Roy C. Geary In short, ϕis what would be knowable from
(1948), and James Durbin (1954). Tamer unlimited amounts of whatever type of data
(2010) credits Frisch (1934) as also being the we have. The key difference between the
first in the literature to describe an example definition of identification given in this sur-
of set identification. vey and previous definitions in the literature
is that previous definitions generally started
with a particular assumption (sometimes only
3. Point Identification implicit) of what constitutes ϕ (examples are
the Wright–Cowles identification and distri-
In modern terminology, the standard bution-based identification discussed in sec-
notion of identification is formally called tion 3.3).
point identification. Depending on context, Assume also that we have a model, which
point identification may also be called global typically imposes some restrictions on the
identification or frequentist identification. possible values ϕ could take on. A simple
When one simply says that a parameter or a definition of (point) identification is then
function is identified, what is usually meant that a parameter θis point identified if, given
is that it is point identified. the model, θis uniquely determined from ϕ .
Early formal definitions of (point) iden- For example, suppose for scalars Y, X,
tification were provided by Koopmans and and θ, our model is that Y = Xθ + e where
Reiersøl (1950), Hurwicz (1950), Fisher E(X 2) ≠ 0and E( eX) = 0, and suppose that
(1966), and Rothenberg (1971). These ϕ, what we can learn from data, includes
include the related concepts of a structure and the second moments of the vector (Y, X).
of observational equivalence. See Chesher Then we can conclude that θ is point iden-
(2008) for additional historical details on tified, because it is uniquely determined
these classical identification concepts. in the usual linear regression way by
In this survey I provide a new general defi- θ = E(XY)/E(X 2), which is a function of
nition of identification. This generalization second moments of ( Y, X).
maintains the intuition of existing classical Another example is to let the model be that
definitions while encompassing a larger class a binary treatment indicator X is assigned to
of models than previous definitions. The dis- individuals by a coin flip, and Y is each individ-
cussion in the text below will be somewhat ual’s outcome. Suppose we can observe real-
informal for ease of reading. More rigorous izations of ( X, Y)that are independent across
definitions are given in the appendix. individuals. We might therefore assume that
ϕ, what we can learn from data, includes
3.1 Introduction to Point Identification
E(Y | X) . It then follows that the average
Recall that θ is the parameter (which could treatment effect θis identified because,
include vectors and functions) that we want when treatment is randomly assigned,
to identify and ultimately estimate. We start θ = E(Y | X = 1) − E(Y | X = 0), that
by assuming there is some information, call it is, the difference between the mean of Y
ϕ, that we either already know or could learn among people who have X = 1(the treated)
840 Journal of Economic Literature, Vol. LVII (December 2019)
and the mean of Y among people who have that what is knowable to begin with, ϕ, is the
X = 0(the untreated). distribution function of W .
Both of the above examples assume that Another common DGP is where each data
expectations of observed variables are point consists of a value of X chosen from its
knowable, and so can be included in ϕ . Since support, and conditional upon that value of X,
sample averages can be observed, to justify we randomly draw an observation of Y inde-
this assumption we might appeal to the con- pendent from the other draws of Y given X.
sistency of sample averages, given conditions For example, Xcould be the temperature
for a weak law of large numbers. at which you choose to run an experiment
When discussing empirical work, a com- and Yis the outcome of the experiment. As
mon question is, “what is the source of the n → ∞this DGP allows us to consistently
identification?” that is, what feature of the estimate and thereby learn about F ( Y | X),
data is providing the information needed to the conditional distribution function of Y
determine θ ? This is essentially asking, what given X. So if we have this kind of DGP in
needs to be in ϕ ? mind, we could start an identification proof
Note that the definition of identification for some θby assuming that F(Y | X) is know-
is somewhat circular or recursive. We start able. But in this case F (Y | X)can only be
by assuming some information ϕis know- known for the values of Xthat can be chosen
able. Essentially, this means that to define in the experiment (e.g., it may be impossible
identification of something, θ, we start by to run the experiment at a temperature X of
assuming something else, ϕ , is itself identi- a million degrees).
fied. Assuming ϕis knowable or identified With more complicated DGPs (e.g., time
to begin with can itself only be justified series data, or cross section data containing
by some deeper assumptions regarding social interactions or common shocks), part
the underlying data-generating process of the challenge in establishing identifica-
(DGP). tion is characterizing what information ϕ is
We usually think of a model as a set of equa- knowable, and hence appropriate to use as
tions describing behavior. But more gener- the starting point for proving identification.
ally, a model is whatever set of assumptions For example, in a time series analyses we
we make about, and restrictions we place on, might start by supposing that the mean, vari-
the DGP. This includes both assumptions ance, and autocovariances of a time series
about the behavior that generates the data are knowable, but not assume information
and about how the data are collected and about higher moments is available. Why
measured. These assumptions in turn imply not? Either because higher moments might
restrictions on ϕ
and θ . In this sense, identifi- not be needed for identification (as in vector
cation (even in purely experimental settings) autoregression models), or because higher
always requires a model. moments may not be stable over time.
A common starting assumption is that the Other possible examples are that ϕ could
DGP consists of nindependently, identically equal reduced-form linear regression coeffi-
distributed (IID) observations of a vector W, cients, or, if observations of W follow a mar-
where the sample size ngoes to infinity. We tingale process, ϕcould consist of transition
know (by the Glivenko–Cantelli theorem, probabilities.
see section 3.4 below) that with this kind What to include in ϕdepends on the
of data we could consistently estimate the model. For example, in dynamic panel data
distribution of W . It is therefore reasonable models, the Arellano and Bond (1991) esti-
with IID data in mind to start by assuming mator is based on a set of moments that are
Lewbel: The Identification Zoo 841
assumed to be knowable (since they can be provided here generalize and encompass
estimated from data) and equal zero in the most previous definitions provided in the
population. The parameters of the model are literature. The framework here most closely
identified if they are uniquely determined by corresponds to Matzkin (2007, 2012). Her
the equations that set those moments equal framework is essentially the special case of
to zero. The Blundell and Bond (1998) esti- the definitions provided here in which ϕis a
mator provides additional moments (assum- distribution function. In contrast, the tradi-
ing functional form information about the tional textbook discussion of identification of
initial time period zero distribution of data) linear supply and demand curves corresponds
that we could include in ϕ . We may t herefore to the special case where ϕis a set of limiting
have model parameters that are not identi- values of linear regression coefficients. The
fied with Arellano and Bond moments, but relationship of the definitions provided here
become identified if we are willing to assume to other definitions in the literature, such as
the model contains the additional informa- those given by the Cowles foundation work, or
tion needed for Blundell and Bond moments. in Rothenberg (1971), Sargan (1983), Hsiao
Even in the most seemingly straightforward (1983), or Newey and McFadden (1994), are
situations, such as experimental design with discussed below. In this section, the provided
completely random assignment into treatment definitions will still be somewhat informal,
and control groups, additional assumptions stressing the underlying ideas and intuition.
regarding the DGP (and hence regarding the More formal and detailed definitions are pro-
model and ϕ) are required for identification vided in the appendix.
of treatment effects. Typical assumptions that Define a model Mto be a set of functions
are routinely made (and may often be vio- or constants that satisfy some given restric-
lated) in this literature are assumptions that tions. Examples of what might be included
rule out certain types of measurement errors, in a model are regression functions, error
sample attrition, censoring, social interac- distribution functions, utility functions,
tions, and general equilibrium effects. game payoff matrices, and coefficient vec-
In practice, it is often useful to distinguish tors. Examples of restrictions could include
between two types of DGP assumptions. assuming regression functions are linear or
One is assumptions regarding the collection monotonic or differentiable, or that errors
of data, for example, selection, measure- are normal or fat tailed, or that parameters
ment errors, and survey attrition. The other are bounded.
is assumptions regarding the generation of Define a model value mto be one par-
data, for example, randomization or statistical ticular possible value of the functions or
and behavioral assumptions. Arellano (2003) constants that comprise M. Each m implies
refers to a set of behavioral assumptions that a particular DGP. An exception is incoher-
suffice for identification as an identifica- ent models (see section 4), which may have
tion arrangement. Ultimately, both types of model values that do not correspond to any
assumptions determine what we know about possible DGP.
the model and the DGP, and hence deter- Define ϕto be a set of constants and/or
mine what identification is possible. functions about the DGP that we assume
are known or knowable from data. Common
3.2 Defining Point Identification
examples of ϕmight be data distribution
Here we define point identification and functions, conditional mean functions, lin-
some related terms, including structure and ear regression coefficients, or time series
observational equivalence. The definitions autocovariances.
842 Journal of Economic Literature, Vol. LVII (December 2019)
Define a set of parameters θ to be a set We’re now ready to define identification.
of unknown constants and/or functions that The parameter θ is defined to be point iden-
characterize or summarize relevant features tified (often just called identified) if there
of a model. Essentially, θcan be anything do not exist any pairs of possible values θ
we might want to estimate. Parameters θ and θ̃ that are different but observationally
could include what we usually think of as equivalent.
model parameters, such as regression coef- Let Θdenote the set of all possible val-
ficients, but θ could also be, for example, the ues that the model says θcould be. One of
sign of an elasticity, or an average treatment these values is the unknown true value of
effect. θ, which we denote as θ 0. We say that the
The set of parameters θmay also include particular value θ 0is point identified if θ 0 is
nuisance parameters, which are defined as not o bservationally equivalent to any other
parameters that are not of direct economic θin Θ . However, we don’t know which of
interest, but may be required for identifica- the possible values of θ(that is, which of
tion and estimation of other objects that are the elements of Θ ) is the true θ 0. This is
of interest. For example, in a linear regres- why, to ensure point identification, we gen-
sion model θmight include not only the erally require that no two elements θ and θ̃
regression coefficients, but also the mar- in the set Θ having θ ≠ θ̃ be observation-
ginal distribution function of identically dis- ally equivalent. Sometimes this condition is
tributed errors. Depending on context, this called global identification rather than point
distribution might not be of direct interest identification, to explicitly say that θ0 is point
and would then be considered a nuisance identified no matter what value in Θ turns
parameter. It is not necessary that nuisance out to be θ 0.
parameters, if present, be included in θ, but We have now defined what it means to
they could be. have parameters θbe point identified. We
We assume that each particular value say that the model is point identified when
of mimplies a particular value of ϕ and of no pairs of model values m and m̃ in M are
θ(violations of this assumption can lead to observationally equivalent (treating m and
incoherence or incompleteness, as discussed m ̃ as if they were parameters). Since every
in a later section). However, there could be model value is associated with at most one
many values of m that imply the same ϕ or value of θ, having the model be identified
the same θ. Define the structure s( ϕ, θ) to be is sufficient, but stronger than necessary, to
the set of all model values m that yield both also have any possible set of parameters θ be
the given values of ϕ and of θ. identified.
Two parameter values, θ and θ̃ , are defined The economist or econometrician defines
to be observationally equivalent if there the model M, so we could in theory enumer-
exists a ϕsuch that both s( ϕ, θ) and s( ϕ, θ̃ ) ate every m ∈ M, list every value of ϕand θ
are not empty. Roughly, θ and θ̃ observation- that is implied by each m, and thereby check
ally equivalent means there exists a value ϕ every pair s (ϕ, θ) and s( ϕ, θ̃ ) to see if θ is point
such that, if ϕ is true, then either the value θ identified or not. The difficulty of proving
or θ̃ could also be true. Equivalently, θ and θ̃ identification in practice is in finding tractable
being observationally equivalent means that ways to accomplish this enumeration. Note
there exists a ϕ and model values m and m̃ that since we do not know which value of θ is
such that model value m yields the values ϕ the true one, proving identification in practice
and θ , and model value m ̃ yields the values requires showing that the definition holds for
ϕ and θ̃ . any possible θ, not just the true value.
Lewbel: The Identification Zoo 843
We conclude this section by defining some set identification in which the identified set
identification concepts closely related to point contains only one element, which is θ0.
identification. Later sections will explore Parametric identification is where θis a
these identification concepts in more detail. finite set of constants and all the different
The concepts of local and generic iden- possible values of ϕalso correspond to dif-
tification deal with cases where we can’t ferent values of a finite set of constants.
establish point identification for all θ in Θ. Nonparametric identification is where θ con-
Local identification of θ 0 means that there sists of functions or infinite sets. Other cases
exists a neighborhood of θ 0 such that, for are called semiparametric identification,
all values θ ≠ θ0in this neighborhood, θ is which includes situations where, for exam-
not observationally equivalent to θ0. As with ple, θincludes both a vector of constants
point identification, since we don’t know and nuisance parameters that are functions.
ahead of time which value of θ is θ0, to prove As we will see in section 6, sometimes the
that local identification holds we would need differences between parametric, semipara-
to show that for any θ ̃ ∈ Θthere exists a metric, and nonparametric identification can
neighborhood of θ such that, for any θ ≠ θ̃
̃ be somewhat arbitrary (see Powell 1994 for
in this neighborhood, θis not observationally further discussion of this point).
equivalent to θ ̃ .
3.3 Examples and Classes of Point
Generic identification roughly means that
Identification
the set of values of θin Θthat cannot be
point identified is a very small subset of Θ . Consider some examples to illustrate the
Suppose we took all the values of θ in Θ, and basic idea of point identification.
divided them into two groups: those that are
observationally equivalent to some other ele- Example 1: a median.—Let the model
ment of Θ, and those that are not. If θ0 is in Mbe the set of all possible distributions
the second group, then it’s identified, other- of a random variable Whaving a strictly
wise it’s not. Since θ 0could be any value in monotonically increasing distribution
Θ, and we don’t know which one, to prove function. Our DGP consists of IID draws
point identification in general we would of W . From this DGP, what is know-
need to show that the first group is empty. able is F(w), the distribution function of
The parameter θ is defined to be generically W. Let our parameter θbe the median of W.
identified if the first group is extremely small In this simple example, we know θ is iden-
(formally, if the first group is a measure zero tified because it’s the unique solution to
subset of Θ). Both local and generic identifi- F(θ) = 1 / 2. By knowing F, we can deter-
cation are discussed in more detail later. mine θ.
The true parameter value θ 0is said to be How does this example fit the general defi-
set identified (sometimes also called par- nition of identification? Here, each value of
tially identified) if there exist some values ϕis a particular continuous, monotonically
of θ ∈ Θthat are not observationally equiv- increasing distribution function F. In this
alent to θ0. So the only time a parameter θ example, each model value mhappens to
is not set identified is when all θ ∈ Θ are correspond to a unique value of ϕ because
observationally equivalent. For set-identi- each possible distribution of Whas a unique
fied parameters, the identified set is defined distribution function. In this example, for
to be the set of all values of θ ∈ Θthat are any given candidate value of ϕ and θ, the
observationally equivalent to θ 0. Point iden- structure s( ϕ, θ)is either an empty set or it
tification of θ 0is therefore the special case of has one element. For a given value of ϕ and
844 Journal of Economic Literature, Vol. LVII (December 2019)
θ, if ϕ
= Fand F
(θ) = 1 / 2(the definition function. This distinction depends on how
of a median) the set s (ϕ, θ)contains one ele- we define ϕ and θ. For example, suppose we
ment. That element m is the distribution had IID observations of Y , X. We could then
that has distribution function F. Otherwise, have defined ϕ to be the joint distribution
if ϕ = Fwhere F(θ) ≠ 1 / 2, the set s( ϕ, θ) function of Y , X, and defined θto include
is empty. In this example, it’s not possible to both the coefficients of X and the distribu-
have two different parameter values θ and tion function of the error term e. Given the
θ ̃ be observationally equivalent, because same model M , including the restriction that
F(θ) = 1 / 2and F(θ̃ ) = 1 / 2implies θ = θ̃ E(XX′ )is non-singular, we would then have
for any continuous, monotonically increasing semiparametric identification of θ.
function F. Therefore, θis point identified,
because its true value θ 0 cannot be observa- Example 3: Treatment.—Suppose the DGP
tionally equivalent to any other value θ. consists of individuals who are assigned a
treatment of T = 0or T = 1 , and each
Example 2: Linear regression.—Consider individual generates an observed outcome
a DGP consisting of observations of Y , X Y. Assume Y, Tare independent across indi-
where Yis a scalar and Xis a K-vector. The viduals. In the Rubin (1974) causal notation,
observations of Y and Xmight not be inde- define the random variable Y(t)to be the out-
pendent or identically distributed. Assume come an individual would have generated if
the first and second moments of Xand Y he or she were assigned T = t. The observed
are constant across observations, and let ϕ Ysatisfies Y = Y(T). Let the parameter of
be the set of first and second moments of X interest θbe the average treatment effect
and Y. Let the model Mbe the set of joint (ATE), defined by θ = E(Y(1) − Y(0)). The
distributions of e , Xthat satisfy Y = X′ θ + e, model Mis the set of all possible joint dis-
where θis some K-vector of parameters, e tributions of Y( 1), Y(0), and T. One possible
is an error term, E(Xe) = 0for an error restriction on the model is Rosenbaum and
term e, and where e, Xhas finite first and Rubin’s (1983) assumption that (Y(1), Y(0))
second moments. The structure s(ϕ, θ) is is independent of T. This assumption, equiv-
nonempty when the moments comprising alent to random assignment of treatment, is
ϕsatisfy E [ X(Y − X′ θ)] = 0for the given what Rubin (1990) calls unconfoundedness.
θ. To ensure point identification, we could Imposing unconfoundedness means that M
add the additional restriction on M that only contains model values m(i.e., joint dis-
E(XX′ )is non-singular, because then θ tributions) where (Y(1), Y(0)) is independent
would be uniquely determined in the usual of T.
way by θ = E (XX′ ) −1 E(XY). However, The knowable function ϕfrom this
if we do not add this additional restric- DGP is the joint distribution of Yand T.
tion, then we can find values θ ̃ that are Given unconfoundedness, θis identified
observationally equivalent to θby let- because unconfoundedness implies that
ting θ̃ = E (XX′ ) − E(XY) for different θ = E(Y | T = 1) − E(Y | T = 0), which
pseudoinverses E (XX′ ) −. is uniquely determined from ϕ. Heckman,
As described here, identification of θ is Ichimura, and Todd (1997) note that a weaker
parametric, because θ is a vector and ϕ can sufficient condition for identification of θ by
be written as a vector of moments. However, this formula is the mean u nconfoundedness
some authors claim linear regression as assumption that E( Y(t) | T) = E(Y(t)).
being semiparametric, because it includes If we had not assumed some form of
errors ethat have an unknown distribution unconfoundedness, then θ might not equal