Professional Documents
Culture Documents
Testing Non-Identifying Restrictions
Testing Non-Identifying Restrictions
Testing Non-Identifying Restrictions
1
Marc Henry
Columbia University
First draft: September 15, 2005
This draft: January 25, 2006
Abstract
We propose a test of specication for structural models without
identifying assumptions. The model is dened as a binary relation
between latent and observable variables, coupled with a hypothesized
family of distributions for the latent variables. The objective of the
testing procedure is to determine whether this hypothesized family
of latent variable distributions has a non-empty intersection with the
set of distributions compatible with the observable data generating
process and the binary relation dening the model. When the model is
given in parametric form, The test can be inverted to yield condence
intervals for the identied parameter set.
JEL Classication: C12, C14
Keywords: random sets, empirical process.
1
Preliminary and incomplete. Helpful discussions with Alfred Galichon, Rosa Matzkin,
Alexei Onatski, Jim Powell and Peter Robinson are gratefully acknowledged (with the
usual disclaimer). Correspondence address: Department of Economics, Columbia Univer-
sity, 420 W 118th Street, New York, NY 10027, USA. mh530@columbia.edu.
1
1 Introduction
We consider a very general econometric model specication Variables under
consideration are divided into two groups.
Latent variables, u U = R
d
u
. The vector u is not observed by the
analyst, but some of its components may be observed by the economic
actors. Theorem 1 below holds more generally when U is a complete,
metrizable and separable topological space (i.e. a Polish space).
Observable variables, y Y = R
d
y
. The vector y is observed by the
analyst. Theorem 1 holds more generally when Y is a convex metrizable
subset of a locally convex topological vector space.
The Borel sigma-algebras of Y and U will be respectively denoted B
Y
and B
U
.
Call P the Borel probability measure that represents the true data generat-
ing process for the observable variables, and V a family of Borel probability
measures that are hypothesized to be possible data generating processes for
the latent variables. Finally, the economic model is given by a relation be-
tween observable and latent variables, i.e. a subset of Y U, which we shall
write as a multi-valued mapping from Y to U denoted by . Suppose a set
of restrictions on the hypothesized latent variable distributions is given by
V
0
V and a set of restrictions on the model is given by
0
.
Example 1: parametric models and restrictions. Suppose the eco-
nomic model is known up to a nite dimensional parameter vector , the
chosen family of distributions for the latent variables depends on a nite
dimensional parameter vector . The hypothesized restrictions are the fol-
lowing;
0
R
d
;
0
R
d
.
2
The restricted family of distributions for the latent variables then becomes
V
0
= {
,
0
}
and all the models {
,
0
} are considered for the relation linking
observable to latent variables. Hence our restricted model is dened by
0
=
_
.
Example 2: games with multiple equilibria. Suppose the payo
function for player j, j = 1, . . . , J is given by
j
(S
j
, S
j
, X
j
, U
j
; ),
where S
j
is player js strategy and S
j
is their opponents strategies. X
j
is a
vector of observable characteristics of player j and U
j
a vector of unobservable
determinents of the payo. Finally is a vector of parameters. Pure strategy
Nash equilibrium conditions
j
(S
j
, S
j
, X
j
, U
j
; )
j
(S, S
j
, X
j
, U
j
; ), for all S
dene a correspondence
and
P at a given signicance level, and conversely, one can derive a set of s
compatible with P and
for a given .
The next section proposes the characterization of probability measures in
the Core of a random set, the following section describes the testing prin-
ciples and the last section illustrates the approach on a simple entry model
with multiple equilibria. Proofs and additional results are collected in the
appendix.
4
2 Testing general model specications
2.1 Denition of the null hypothesis
We wish to develop a procedure to detect whether the two sets of restrictions,
on the family of distributions for the latent variables on the one hand, and
on the relation between observable and latent variables on the other hand
are compatible. First we explain what we mean by compatible. It is very
easily understood in the simple case where the link between latent and
observable variables is parametric and
by
P
1
(A) = P{y Y |
. In the
general case considered here,
0
may not be single valued, and its images may
not even be disjoint (which would be the case if it was the inverse image of
a single valued mapping from U to Y , i.e. a traditional function from latent
to observable variables). However, under a measurability assumption on
0
,
we can construct an analogue of the image measure, which will now be a set
Core(
0
, P) of Borel probability measures on U (to be dened below), and the
hypothesis of compatibility of the restrictions on latent variable distributions
and on the models linking latent and observable variables will naturally take
the form
H
0
: V
0
Core(
0
, P) = . (2)
Assumption 1
0
has non-empty and closed values, and for each open set
O U,
1
0
(O) = {y Y |
0
(y) O = } B
Y
.
5
To relate the present case to the intuition of the single-valued case, it
is useful to think in terms of single-valued selections of the multi-valued
mapping
0
. A measurable selection of
0
is a measurable function such
that (y) (y) for all y Y . The set of measurable selections of a multi-
valued mapping that satises Assumption 1 is denoted Sel(
0
), and it is
known to be non-empty since Rokhlin (1949) Part I, 2, N
o
9, Lemma 2
1
.
To each selection of
0
, we can associate the image measure of P, denoted
P
1
, dened as in (1).
A natural reformulation of the compatibility condition is that at least
a probability measure V
0
can be written as a mixture of probability
measures of the form P
1
, where ranges over Sel(
0
). However, even
for the simplest multi-valued mapping, the set of measurable selections is
very rich, let alone the set of their mixtures. Hence, our rst goal is to
give a manageable representation of such a mixture. This is the object of
Theorem 1 below.
Theorem 1 Under assumption 1, is a mixture of images of P by measurable
selections of
0
, (i.e. for any in the weak closed convex hull of {P
1
;
Sel(
0
)}) if and only if there exists for P-almost all y Y a probability
measure
0
is the inverse image of a single-valued measurable function (i.e. when the
model is given by a single-valued measurable function from latent to observ-
able variables), the probability kernel
(y, .)
where y is a realization of a random element with distribution P.
Remark 3: We dene Core(
0
, P) as the weak convex-hull of {P
1
;
Sel(
0
)}, or equivalently as the set of all mixtures of images of P by measur-
able selections of
0
. So our null hypothesis (2) is well dened.
2
2.2 Denition of the test statistic
Now that we have identied the set of latent variable data generating processes
compatible with the observable distribution P and the model correspondence
0
with Core(
0
, P), and we have characterized elements of the latter by
means of Theorem 1, we propose a test statistic based on this characteriza-
tion. Call V
00
the subset of V
0
that is compatible with the model correspon-
dence and the distribution of observables. Hence
V
00
= V
0
Core(
0
, P),
which is non empty under the null H
0
by denition. By Theorem 1, an
element of V
0
is in V
00
if and only if can be written as
(.) =
_
Y
(y, .)P(dy),
2
The name Core is justied by Theorem A2 of Appendix A.
7
where the
(y, .)P(dy) = P
.
Consider such a sample of observations (Y
1
, . . . , Y
n
). The empirical distrib-
ution, i.e. the probability measure that gives mass
1
n
to each observation, is
denoted P
n
, with
P
n
(A) =
1
n
n
j=1
I
A
(Y
j
), all A B
Y
.
The empirical counterpart of the integral P
is
n
= P
n
=
1
n
n
j=1
(Y
j
, .).
The asymptotic behaviour of the dierence between P
, call B
u
the rectangles
d
u
i=1
(, u
i
] and
,u
=
(B
u
). Let G
n
=
n(P
n
P) denote the empirical
process associated with the sample (Y
1
, . . . , Y
n
), and nally, let denote
convergence in distribution (aka weak convergence). Then we have
Theorem 2: For any V
00
with a density with respect to Lebesgue
measure, and for any
satisfying (3), G
n
converges weakly, uniformely over
the family of functions
_
,u
, u R
d
u
_
, to a P-Brownian bridge G, i.e. a
Gaussian process with zero mean and covariance function dened by
EG
,u
G
,v
= P
,u
,v
P
,u
P
,v
.
This implies that
n sup
uR
d
u
|P
n
,u
P
,u
| G
8
where G
is
such that for all x R, Pr(G
> x) = 2
j=1
(1)
j+1
e
2j
2
x
2
.
Remark 1: A remarkable feature of Theorem 2, is that in the case of a single
real valued latent variable, the test statistic has a distribution-free limit with
easily computable quantiles.
The test statistic implicitly proposed in Theorem 2 to test whether a given
latent variable distribution is compatible with the model restriction
0
and
the observable distribution P is infeasible in that is depends on the unknown
probability kernels
such that = P
n
. They can be estimated as solutions
from the integral equation = P
n
(y) are
probability measures on (y). This equation has solutions (generically many)
if and only if Core(
0
, P
n
) by Theorem 1, but solutions are likely to be
dicult to exhibit except in very simple cases, such as the cases developed
in section 3.
An alternative is to construct a test statistic based on the distance be-
tween a hypothesized latent variable measure (or more generally V
0
) and
Core(, P
n
), which by construction will be smaller than the test statistic
of Theorem 2, and hence can be used as a basis for a conservative testing
procedure. This is summarized in the following corollary:
Corollary 1: Under the null H
0
,
limsup
n
inf
V
0
inf
Core(
0
,P
n
)
sup
uR
d
u
n|(B
u
) (B
u
)| sup
uR
d
u
|G|,
and the inma are achieved providing V
0
is chosen to be closed in the weak
topology.
Given the conservative nature of the procedure based on Corollary 1, it
9
is crutial to assess the power of the test, as described in the next section.
2.3 Power analysis
The two test statistics considered in section 2.2 are the following:
TS1 = inf
V
0
sup
uR
d
u
n |P
n
(B
u
) (B
u
)|
TS2 = inf
V
0
inf
Core(
0
,P
n
)
sup
uR
d
u
n |(B
u
) (B
u
)|.
Note that since P
n
Core(
0
, P
n
), TS2 is dominated by TS1 by construc-
tion.
To assess the power of either test, we consider the following types of local
alternatives:
d
H
(V
n
, Core(
n
, P)) r
1
n
, > 0, (4)
where r
n
is a deterministic sequence of reals diverging with n, and d
H
denotes
Hausdor distance dened as follows: for any two sets V
1
and V
2
in (U),
and any d metrizing weak convergence,
d
H
(V
1
, V
2
) = max
_
sup
1
V
1
inf
2
V
2
d(
1
,
2
), sup
2
V
2
inf
1
V
1
d(
1
,
2
)
_
The principle of both test statistics TS1 and TS2 rests on the set con-
vergence of Core(, P
n
) to Core(, P) for a xed model correspondence .
Hence, for n large enough, Core(
n
, P
n
) is suciently close to Core(
n
, P) for
the test statistic to detect the sequence of local alternatives, as summarized
in the following theorem.
Theorem 3: Under the sequence of alternatives dened in (4) with r
n
=
o(
(B
u
): with
1
(x
1
, x
2
, u) = (x
2
u)I
{x
1
=1}
,
2
(x
1
, x
2
, u) = (x
1
u)I
{x
2
=1}
,
where x
i
{0, 1} is rm is action, and u is an exogenous cost. The rms
know their cost; the analyst, however, knows only that u [0, 1], and that
the structural parameter is in (0, 1]. There are two Nash equilibria. The
rst is x
1
= x
2
= 0 for all u [0, 1]. The second is x
1
= x
2
= 1 for all
u [0, ] and zero otherwise. Since the two rms actions are perfectly
correlated, we shall denote them by a single binary variable y = x
1
= x
2
.
Hence the model is described by the multi-valued mapping: (1) = [0, ] and
3
Jovanovic (1989) and Tamer (2003) consider this simple game in a similar context.
11
(0) = [0, 1]. If we consider the restriction
max
, then the multi-valued
mapping incorporating the restriction is
0
dened by
0
(1) = [0,
max
] and
0
(0) = [0, 1]. In this case, since y is Bernoulli, we can write P = (1
p, p) with p the probability of a 1. For the distribution of u, we consider a
parametric exponential family on [0, 1]. Hence V = {
:= u
1
du}
>0
, and
the restriction can be chosen as > 0.
Consider the smallest reliability P that can be attached to a set A in U
based on
0
and P, dened by
P(A) = P{y Y |
0
(y) A}.
Our null hypothesis of compatibility of the two sets of restrictions is that for
some [, ],
associates
to each set a measure at least as large as the smallest reliability that can be
attached to it. This is equivalent to the existence of a [, ] such that for
P-almost all y a probability measure
(y, .) supported on
0
(y) such that
for all u [0, 1]
u
=
_
Y
= (1 p)
(1, [0,
max
]) =
max
< p, there is no solution (i.e.
12
the two sets of restrictions are incompatible), whereas when
max
p, a
continuous solution is given by
p
_
u
max
_
1 p
I
{u
max
}
+
u
p
1 p
I
{u>
max
}
max
_
I
{u
max
}
+ I
{u>
max
}
.
Consider now the empirical process G
n
=
n(P
n
P) applied to the
family of functions
,u
(y) :=
,u
=
n
p
n
p
1 p
g
,u
,
where
g
,u
:= (1 u
)I
{u
max
}
+
_
_
u
max
_
)
_
I
{u<
max
}
.
Note that
inf
sup
u[0,1]
|g
,u
| = 1
max
.
The class of functions {g
,u
; , u [0, 1]} is a Vapnik-
Cervonekis
class, since the class of subgraphs of these functions are unions of intervals
tied to zero, hence they cannot shatter any set of two points (see for instance
van der Vaart and Wellner (2000)). Hence, the family {
,u
; , u
[0, 1]} is P-Donsker, and
n
p
n
p
1p
weakly converges to a centered normal
random variable with variance p/(1 p), we have the following:
inf
sup
u[0,1]
|G
n
,u
|
_
p
(1 p)
(1
max
) |Z|,
where Z is a random variable with standard normal distribution. Now, by
construction, inf
sup
u[0,1]
|G
n
,u
| is dominated by sup
u[0,1]
|G
n
,u
|
13
which for any satisfying
> x) = 2
j=1
(1)
j+1
e
2j
2
x
2
.
A testing procedure that does not require computation of the kernels ,
as described in section 2.2 consists in nding the element of Core(
0
, P
n
)
that minimizes the Kolmogorov-Smirnov distance to the set of distributions
{u
, }. If
max
p
n
, then u
max
< p
n
, then the minimum Kolmogorov-Smirnov
distance is p
n
u
. If
max
> p, ultimately so will p
n
and the test statistic is
zero. If
max
< p, then ultimately so will p
n
, and the test statistic diverges.
Finally, if
max
= p, then the statistic will be
nmax(0, p
n
p). Finally, in
this very simple example, one might consider a the test of the null H
0
: p =
max
against the one-sided alternative H
a
: p >
max
using the fact that
under the null,
_
n
max
(1
max
)
_
1
2
(p
n
max
) converges to a standard
normal random variable.
To summarize, the procedures proposed are based on the following test
statistics:
TS1 = inf
sup
u[0,1]
|G
n
,u
|
TS2 = inf
inf
Core(
0
,P
n
)
sup
u[0,1]
n|u
F
(u)|
TS3 =
n(p
n
max
)
and the following approximating distributions:
14
AD1 =
_
p
(1 p)
(1
max
) |N(0, 1)|
AD2 = G
AD3 =
_
max
(1
max
) N(0, 1)
AD4 =
_
p(1 p) N(0, 1)
The procedure based on estimation of the probability kernels and com-
parison between hypothesized distributions = P
1
(x
1
, x
2
, u) = (x
2
u
1
)I
{x
1
=1}
,
2
(x
1
, x
2
, u) = (x
1
u
2
)I
{x
2
=1}
,
15
where x
i
{0, 1} is rm is action, and the us are rm specic exogenous
costs. The rms know their cost; the analyst, however, knows only that
u [0, 1]
2
, and that the structural parameter is in (0, 1]. There are two
Nash equilibria. The rst is x
1
= x
2
= 0 for all u [0, 1]
2
. The second
is x
1
= x
2
= 1 for all u [0, ]
2
and zero otherwise. Since the two rms
actions are perfectly correlated, we shall denote them by a single binary
variable y = x
1
= x
2
. Hence the model is described by the multi-valued
mapping: (1) = [0, ]
2
and (0) = [0, 1]
2
. If we consider the restriction
max
, then the multi-valued mapping incorporating the restriction is
0
dened by
0
(1) = [0,
max
]
2
and
0
(0) = [0, 1]
2
. In this case, since y is
Bernoulli, we can write P = (1 p, p) with p the probability of a 1. For
the distribution of u, we consider the costs to be independent with marginals
following the same parametric exponential family on [0, 1]. Hence V = {
:=
u
1
1
u
1
2
du
1
du
2
}
>0
, and the restriction can be chosen as > 0.
A density version of (3) can be derived in this case, and makes for a more
convenient test statistic. Writing u = (u
1
, u
2
)
,
f
(u) = f
(u
1
)f
(u
2
) =
2
u
1
1
u
1
2
=
_
Y
(y, u)P(dy).
In other words,
2
u
1
1
u
1
2
= (1 p)
(0, u) + p
(1, u)
under the constraints
_
[0,1]
2
(0, u)du =
_
[0,
max
]
2
(1, u)du = 1
and
(0, u) =
f
(u)
1 p
_
1 p
2
max
I
[0,
max
]
2
_
16
(1, u) =
2
max
f
(u) I
[0,
max
]
2.
Consider now the empirical process G
n
=
n(P
n
P) applied to the
family of functions
,u
(y) :=
,u
=
n
p
n
p
1 p
g
,u
,
where
g
,u
:= f
(u)
_
2
max
I
[0,
max
]
2 1
_
.
In this case, it is convenient to use the L
1
metric, we are looking at the
minimum of
_
[0,1]
2
|G
n
,u
| =
n
|p
n
p|
1 p
_
[0,1]
2
f(u)
2
max
I
[0,
max
]
2 1
du.
Now
_
[0,1]
2
f(u)
2
max
I
[0,
max
]
2 1
du = 2(1
2
max
)
which is minimized at = to yield 2(1
2
max
). So
inf
_
[0,1]
2
|G
n
,u
| du 2
_
p
1 p
(1
2
max
) |Z|.
where Z is a standard normal random variable.
Appendix A: Empirical Distributions of Random Sets
In assumption 1, we assume that the correspondence is measurable in the
traditional sense, dened below:
Denition A1 (Eros Measurability) A correspondence : (Y, B
Y
)
(U, B
U
) is said to be Eros measurable, or weakly measurable, or simply
17
measurable, if the inverse image of open sets is measurable, i.e. if for all
open subsets O of U,
1
(O) = {y Y | (y) O = } B
Y
.
There are several ways a measurable correspondence can convey proba-
bilistic information on its image space (U, B
U
) given observed frequencies of
outcomes in Y .
Dempster (1967) suggests to consider the smallest reliability that can be
associated with the event A B
U
as the belief function
P(A) = P{y Y | (y) A}
and the largest plausibility that can be associated with the event A as the
plausibility function
P(A) = P{y Y | (y) A = }
the two being linked by the relation
P(A) = 1 P(A
c
), (5)
which prompted some authors to call them conjugates or dual of each other.
A natural way to construct a set of probability measures is to consider
all probability measures that dominate the set function P set-wise, forming
thus the core of the belief function:
Core(, P) = { M(U) | A B
U
, (A) P(A)}
= { M(U) | A B
U
, (A) P(A)}
where the rst equality can be taken as a denition, and the second follows
immediately from (5). It is well known that Core(, P) is non-empty, and it
will be shown as a consequence of (3.2) below.
18
A dierent way of dening probabilistic information generated by the
correspondence can be derived from Aumanns idea (in Aumann (1965))
of considering correspondences as bundles of their selections.
Dene the domain of the correspondence by
Dom() = {y Y | (y) = }.
A measurable selection of the measurable correspondence is dened by the
property below:
Denition A2 (Measurable Selection) A measurable selection of corre-
spondence : (Y, B
Y
) (U, B
U
) is a (B
Y
, B
U
)-measurable function such
that (y) (y) for all y Dom().
The set of measurable selections of a measurable correspondence is
denoted Sel(), and it is non-empty by a theorem due to Rokhlin, (Rokhlin
(1949) Part I, 2, N
o
9, Lemma 2) and generally attributed to Kuratowski
and Ryll-Nardzewski:
Theorem A1 (Rokhlin) An Eros measurable correspondence with
closed non-empty values admits a measurable selection. For a proof, see
for instance Theorem 8.1.3 page 308 of Aubin and Frankowska (1990).
Elements of Sel() can be used to transport the probability P on Y to
probabilities on U. For each Sel(), consider the probability dened
on each A B
U
by
(A) = P{y Y | (y) A} = P
1
(A),
and dene
(, P) = { M(U), = P
1
some Sel()}.
19
It is easily seen that (, P) Core(, P). A converse is given by the
following theorem of Castaldo, Maccheroni, and Marinacci (2004):
Theorem A2 (Castaldo, Maccheroni and Marinacci) If is measur-
able and compact-valued, then Core(, P) is the weak closed convex hull of
(, P).
We now develop the claim made in remark 1 of Theorem 3. and P
dene a random set with realizations (Y
j
) for realizations Y
j
from P. P is
the distribution of the random set, and the empirical distribution associated
with a sample (Y
1
, . . . , Y
n
) is given by
P(A) =
1
n
n
j=1
I
{(Y
j
)A=}
.
Core(, P) characterizes P and Core(, P
n
) characterizes P
n
. Hence The-
orem 1 and Donsker theorems provide a way to derive conditions for weak
convergence of empirical random sets at rate
n.
Appendix B: Proofs
Proof of Theorem 1:
Call (B) the set of all Borel probability measures with support B. Under
Assumption 1, the map y (
0
(y)) is a map from Y to the set of all
non-empty convex sets of Borel probability measures on U which are closed
with respect to the weak topology. Moreover, for any f C
b
(U), the set of
all continuous bounded real functions on U, the map
y sup
__
fd : (
0
(y))
_
= max
u
0
(y)
f(u)
is B
Y
-measurable, so that, by Theorem 3 of Strassen (1965), for a given
(U), there exists satisfying (3) with (y, .) (
0
(y)) for P-almost
20
all y if and only if
_
U
f(u)(du)
_
U
sup
u
0
(y)
f(u)P(dy) (6)
for all f C
b
(U). Now, dening P as the set function
P : B P({y Y :
0
(y) B = }),
the right-hand side of (6) is shown in the following sequence of equalities to
be equal to the integral of f with respect to P in the sense of Choquet (line
(7) below can be taken as a denition).
_
Y
sup
u
0
(y)
{f(u)} dP(y)
=
_
0
P{y Y : sup
u
0
(y)
{f(u)} x} dx
+
_
0
(P{y Y : sup
u
0
(y)
{f(u)} x} 1) dx
=
_
0
P{y Y :
0
(y) {f x}} dx
+
_
0
(P{y Y :
0
(y) {f x}} 1) dx
=
_
0
P({f x}) dx +
_
0
(P({f x}) 1) dx =
_
Ch
f dP. (7)
By Theorem 1 of Castaldo, Maccheroni, and Marinacci (2004), for any
f C
b
(U),
_
Ch
f dP = max
Sel(
0
)
_
U
f(u)P
1
(du),
so that (6) is equivalent to
max
Sel(
0
)
_
U
f(u)P
1
(du)
_
U
f(u)(du) (8)
21
for any f C
b
(U). If is in the weak closure of the set of convex combina-
tions of elements of {P
1
: Sel(
0
)}, then by linearity of the integral
and the denition of weak convergence, (8) holds. Conversely, if satises
(8), then it satises
_
Ch
f dP
_
U
f(u)(du)
and by monotone continuity, we have for all A B
U
, and I
A
the indicator
function,
_
U
I
A
(u)(du)
_
Ch
I
A
dP.
Hence (A) P(A) for all A B
U
, which by Corollary 1 of Castaldo, Mac-
cheroni, and Marinacci (2004) implies that is the weak limit of a sequence
of convex combinations of elements of {P
1
: Sel(
0
)}, hence it is a
mixture in the desired sense and the proof is complete.
Proof of Theorem 2:
Under the null hypothesis, V
00
is non-empty. Consider V
00
and a solution
u,
(y)
u
j
,
for all y Y , so that
_
u
j1
,
,
u
j
,
is a bracket of functions
in the family = {
u,
: u R}. The L
1
(P) size of this bracket is
P
u
j
,
u
j1
,
= P
_
u
j
,
u
j1
,
_
= ((u
j1
, u
j
])
2
.
22
Since the functions in the family are probability kernels, we have for all
u R, 0
u,
(y) 1, all y Y , so that
P
_
u
j
,
u
j1
,
_
2
P
u
j
,
u
j1
,
2
and the L
2
(P) size of the bracket is at most . The minimum number of
brackets of L
2
(P) size less than needed to cover the whole family is de-
noted N
[ ]
(, , L
2
(P)) and the logarithm of this quantity is the entropy with
bracketing. Here the entropy with bracketing is less than log(m) and given
that has a density with respect to Lebesgue measure, m can be chosen
smaller than (2/
2
), so that the entropy with bracketing condition
_
0
_
log N
[ ]
(, , L
2
(P))d <
is satised and the empirical process G
n
converges weakly to a centered
Gaussian process G
u,
G
P
v,
= P
u,
v,
P
u,
P
v,
,
uniformly over . Let || ||
u,
||
weakly converges to || G
P
u,
||
.
Notice that
G
u,
= G
((, u]) = B
0
(F
(u))
where B
0
is a version of the standard Brownian bridge, and F
is the distri-
bution function associated with the distribution . Since F
is a continuous
distribution function, || B
0
F
||
= || B
0
||
> u) = 2
j=1
(1)
j1
e
2j
2
u
2
,
and the proof is complete.
23
Proof of Theorem 3
The assumption guarantees that G
n
=
n(P
n
P) converges to a Brownian
Bridge uniformely over the family
(B
u
)|
converges to a bounded random variable. Since the Kolmogorov-Smirnov
metric is stronger than d which metrizes weak convergence, the latter display
implies that
sup
Core(
n
,P)
sup
Core(
n
,P)
d(,
) = O(1/
n).
Now,
TS2 d
H
(V
n
, Core(
n
, P
n
)) sup
Core(
n
,P)
sup
Core(
n
,P)
d(,
)
and TS2 TS1, so the result follows.
References
Andrews, D., S. Berry, and P. Jia (2004): Condence Regions for Pa-
rameters in Discrete Games with Multiple Equilibria, with an Application
to Discount Chain Store Location, unpublished manuscript.
Aubin, J.-P., and H. Frankowska (1990): Set-valued analysis. Boston:
Birkhauser.
Aumann, R. (1965): Integrals of set-valued functions, Journal of Mathe-
matical Analysis and Applications, 12, 112.
24
Castaldo, A., F. Maccheroni, and M. Marinacci (2004): Random
sets and their distributions, Sankhya (Series A), 66, 409427.
Chernozhukov, V., H. Hong, and E. Tamer (2002): Inference on
Parameter Sets in Econometric Models, unpublished manuscript.
Dempster, A. P. (1967): Upper and lower probabilities induced by a
multi-valued mapping, Annals of Mathematical Statistics, 38, 325339.
Dudley, R. (2003): Real Analysis and Probability. Cambridge University
Press.
Jovanovic, B. (1989): Observable implications of models with multiple
equilibria, Econometrica, 57, 14311437.
Pakes, A., J. Porter, K. Ho, and J. Ishii (2004): Moment Inequalities
and Their Application, unpublished manuscript.
Rockafellar, R. T., and R. J.-B. Wets (1998): Variational Analysis.
Springer.
Rokhlin, V. (1949): Selected topics from the metric theory of dynam-
ical systems, Uspekhi Matematicheskikh Nauk, 4, 57128, translated in
American Mathematical Society Transactions 49(1966), 171-240.
Shaikh, A. (2005): Inference for a Class of Partially Identied Econometric
Models, unpublished manuscript.
Strassen, V. (1965): The existence of probability measures with given
marginals, Journal of Mathematical Statistics, 36, 423439.
Tamer, E. (2003): Incomplete Simultaneous Discrete Response Model with
Multiple Equilibria, Review of Economic Studies, 70, 147165.
25
van der Vaart, A., and J. Wellner (2000): Weak Convergence and
Empirical Pocesses. Springer.
26