Testing Non-Identifying Restrictions

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Testing Non-identifying Restrictions

1
Marc Henry
Columbia University
First draft: September 15, 2005
This draft: January 25, 2006
Abstract
We propose a test of specication for structural models without
identifying assumptions. The model is dened as a binary relation
between latent and observable variables, coupled with a hypothesized
family of distributions for the latent variables. The objective of the
testing procedure is to determine whether this hypothesized family
of latent variable distributions has a non-empty intersection with the
set of distributions compatible with the observable data generating
process and the binary relation dening the model. When the model is
given in parametric form, The test can be inverted to yield condence
intervals for the identied parameter set.
JEL Classication: C12, C14
Keywords: random sets, empirical process.
1
Preliminary and incomplete. Helpful discussions with Alfred Galichon, Rosa Matzkin,
Alexei Onatski, Jim Powell and Peter Robinson are gratefully acknowledged (with the
usual disclaimer). Correspondence address: Department of Economics, Columbia Univer-
sity, 420 W 118th Street, New York, NY 10027, USA. mh530@columbia.edu.
1
1 Introduction
We consider a very general econometric model specication Variables under
consideration are divided into two groups.
Latent variables, u U = R
d
u
. The vector u is not observed by the
analyst, but some of its components may be observed by the economic
actors. Theorem 1 below holds more generally when U is a complete,
metrizable and separable topological space (i.e. a Polish space).
Observable variables, y Y = R
d
y
. The vector y is observed by the
analyst. Theorem 1 holds more generally when Y is a convex metrizable
subset of a locally convex topological vector space.
The Borel sigma-algebras of Y and U will be respectively denoted B
Y
and B
U
.
Call P the Borel probability measure that represents the true data generat-
ing process for the observable variables, and V a family of Borel probability
measures that are hypothesized to be possible data generating processes for
the latent variables. Finally, the economic model is given by a relation be-
tween observable and latent variables, i.e. a subset of Y U, which we shall
write as a multi-valued mapping from Y to U denoted by . Suppose a set
of restrictions on the hypothesized latent variable distributions is given by
V
0
V and a set of restrictions on the model is given by
0
.
Example 1: parametric models and restrictions. Suppose the eco-
nomic model is known up to a nite dimensional parameter vector , the
chosen family of distributions for the latent variables depends on a nite
dimensional parameter vector . The hypothesized restrictions are the fol-
lowing;

0
R
d

;
0
R
d

.
2
The restricted family of distributions for the latent variables then becomes
V
0
= {

,
0
}
and all the models {

,
0
} are considered for the relation linking
observable to latent variables. Hence our restricted model is dened by

0
=
_

.
Example 2: games with multiple equilibria. Suppose the payo
function for player j, j = 1, . . . , J is given by

j
(S
j
, S
j
, X
j
, U
j
; ),
where S
j
is player js strategy and S
j
is their opponents strategies. X
j
is a
vector of observable characteristics of player j and U
j
a vector of unobservable
determinents of the payo. Finally is a vector of parameters. Pure strategy
Nash equilibrium conditions

j
(S
j
, S
j
, X
j
, U
j
; )
j
(S, S
j
, X
j
, U
j
; ), for all S
dene a correspondence

from unobservable player characteristics to ob-


servable variables (S, X), and if the unobservable player characteristics, in-
terpreted as types of the players are supposed uniformely distributed on the
relevant domain, then V
0
is a singleton.
In this paper, we propose a general framework for conducting inference
without additional assumptions such as equilibrium selection mechanisms
necessary to identify the model (i.e. to ensure that is single-valued). The
usual terminology for such models is incomplete or partially identied.
3
In a parametric setting, the objective of inference in partially identi-
ed models is the estimation of the set of parameters which are compatible
with the distribution of the observed data and an assessment of the qual-
ity of that estimation. Chernozhukov, Hong, and Tamer (2002) propose an
M-estimation procedure to construct a set that contains all compatible para-
meters with a predetermined probability. Shaikh (2005) extends and renes
their method and Andrews, Berry, and Jia (2004) and Pakes, Porter, Ho,
and Ishii (2004) propose alternative procedures in a similar framework.
The inference procedure presented here is based on a characterization
of probability measures in the Core of the random set generated by the
distribution of observables P and the multivalued mapping
0
and a method
to determine whether hypothesized latent variable data generating processes
satisfy this characterization. No a priori parametric assumptions are needed,
but if they are made, the inference methodology yields similar condence
sets to those proposed in the previously cited papers. In the notation of
example 1, for a given one can derive a set of s compatible with

and
P at a given signicance level, and conversely, one can derive a set of s
compatible with P and

for a given .
The next section proposes the characterization of probability measures in
the Core of a random set, the following section describes the testing prin-
ciples and the last section illustrates the approach on a simple entry model
with multiple equilibria. Proofs and additional results are collected in the
appendix.
4
2 Testing general model specications
2.1 Denition of the null hypothesis
We wish to develop a procedure to detect whether the two sets of restrictions,
on the family of distributions for the latent variables on the one hand, and
on the relation between observable and latent variables on the other hand
are compatible. First we explain what we mean by compatible. It is very
easily understood in the simple case where the link between latent and
observable variables is parametric and

is measurable and single valued for


each
0
. Dening the image measure of P by

by
P
1

(A) = P{y Y |

(y) A}, (1)


for all A B
U
, we say that the restrictions V
0
and
0
are compatible if and
only if there is at least a
0
and a V
0
such that = P
1

. In the
general case considered here,
0
may not be single valued, and its images may
not even be disjoint (which would be the case if it was the inverse image of
a single valued mapping from U to Y , i.e. a traditional function from latent
to observable variables). However, under a measurability assumption on
0
,
we can construct an analogue of the image measure, which will now be a set
Core(
0
, P) of Borel probability measures on U (to be dened below), and the
hypothesis of compatibility of the restrictions on latent variable distributions
and on the models linking latent and observable variables will naturally take
the form
H
0
: V
0
Core(
0
, P) = . (2)
Assumption 1
0
has non-empty and closed values, and for each open set
O U,
1
0
(O) = {y Y |
0
(y) O = } B
Y
.
5
To relate the present case to the intuition of the single-valued case, it
is useful to think in terms of single-valued selections of the multi-valued
mapping
0
. A measurable selection of
0
is a measurable function such
that (y) (y) for all y Y . The set of measurable selections of a multi-
valued mapping that satises Assumption 1 is denoted Sel(
0
), and it is
known to be non-empty since Rokhlin (1949) Part I, 2, N
o
9, Lemma 2
1
.
To each selection of
0
, we can associate the image measure of P, denoted
P
1
, dened as in (1).
A natural reformulation of the compatibility condition is that at least
a probability measure V
0
can be written as a mixture of probability
measures of the form P
1
, where ranges over Sel(
0
). However, even
for the simplest multi-valued mapping, the set of measurable selections is
very rich, let alone the set of their mixtures. Hence, our rst goal is to
give a manageable representation of such a mixture. This is the object of
Theorem 1 below.
Theorem 1 Under assumption 1, is a mixture of images of P by measurable
selections of
0
, (i.e. for any in the weak closed convex hull of {P
1
;
Sel(
0
)}) if and only if there exists for P-almost all y Y a probability
measure

(y, .) on U with support


0
(y), such that
(B) =
_
Y

(y, B) P(dy), all B B


U
. (3)
Remark 1: The weak topology on (U), the set of probability measures on
U, is the topology of convergence in distribution. (U) is also Polish, and
the weak closed convex hull of {P
1
; Sel(
0
)} is indeed the collection
of arbitrary mixtures of elements of {P
1
; Sel(
0
)}.
1
The commentary at the end of chapter 14 of Rockafellar and Wets (1998) sheds light
on the controversy surrounding this attribution.
6
Remark 2: Notice that (3) looks like a disintegration of , and indeed, when

0
is the inverse image of a single-valued measurable function (i.e. when the
model is given by a single-valued measurable function from latent to observ-
able variables), the probability kernel

is exactly the (P,


1
0
)-disintegration
of , in other words,

(y, .) is the conditional probability measure on U un-


der the condition
1
0
(u) = {y}. Hence (3) has the interpretation that a
random element with distribution can be generated as a draw from

(y, .)
where y is a realization of a random element with distribution P.
Remark 3: We dene Core(
0
, P) as the weak convex-hull of {P
1
;
Sel(
0
)}, or equivalently as the set of all mixtures of images of P by measur-
able selections of
0
. So our null hypothesis (2) is well dened.
2
2.2 Denition of the test statistic
Now that we have identied the set of latent variable data generating processes
compatible with the observable distribution P and the model correspondence

0
with Core(
0
, P), and we have characterized elements of the latter by
means of Theorem 1, we propose a test statistic based on this characteriza-
tion. Call V
00
the subset of V
0
that is compatible with the model correspon-
dence and the distribution of observables. Hence
V
00
= V
0
Core(
0
, P),
which is non empty under the null H
0
by denition. By Theorem 1, an
element of V
0
is in V
00
if and only if can be written as
(.) =
_
Y

(y, .)P(dy),
2
The name Core is justied by Theorem A2 of Appendix A.
7
where the

(y, .) are probability measures with support


0
(y) for P-almost
all y. From now on, we shall use the de Finetti notation f for the integral
of the function f with respect to the measure , so that we shall write
_
Y

(y, .)P(dy) = P

.
Consider such a sample of observations (Y
1
, . . . , Y
n
). The empirical distrib-
ution, i.e. the probability measure that gives mass
1
n
to each observation, is
denoted P
n
, with
P
n
(A) =
1
n
n

j=1
I
A
(Y
j
), all A B
Y
.
The empirical counterpart of the integral P

is

n
= P
n

=
1
n
n

j=1

(Y
j
, .).
The asymptotic behaviour of the dierence between P

and its empirical


counterpart P
n

is key to the construction of the test statistic. It is described


in the following theorem. Setting u = (u
1
, . . . , u
d
u
)

, call B
u
the rectangles

d
u
i=1
(, u
i
] and
,u
=

(B
u
). Let G
n
=

n(P
n
P) denote the empirical
process associated with the sample (Y
1
, . . . , Y
n
), and nally, let denote
convergence in distribution (aka weak convergence). Then we have
Theorem 2: For any V
00
with a density with respect to Lebesgue
measure, and for any

satisfying (3), G
n
converges weakly, uniformely over
the family of functions
_

,u
, u R
d
u
_
, to a P-Brownian bridge G, i.e. a
Gaussian process with zero mean and covariance function dened by
EG
,u
G
,v
= P
,u

,v
P
,u
P
,v
.
This implies that

n sup
uR
d
u
|P
n

,u
P
,u
| G

8
where G

is a random variable. In the particular case where d


u
= 1, G

is
such that for all x R, Pr(G

> x) = 2

j=1
(1)
j+1
e
2j
2
x
2
.
Remark 1: A remarkable feature of Theorem 2, is that in the case of a single
real valued latent variable, the test statistic has a distribution-free limit with
easily computable quantiles.
The test statistic implicitly proposed in Theorem 2 to test whether a given
latent variable distribution is compatible with the model restriction
0
and
the observable distribution P is infeasible in that is depends on the unknown
probability kernels

such that = P
n
. They can be estimated as solutions
from the integral equation = P
n

with the restriction that the

(y) are
probability measures on (y). This equation has solutions (generically many)
if and only if Core(
0
, P
n
) by Theorem 1, but solutions are likely to be
dicult to exhibit except in very simple cases, such as the cases developed
in section 3.
An alternative is to construct a test statistic based on the distance be-
tween a hypothesized latent variable measure (or more generally V
0
) and
Core(, P
n
), which by construction will be smaller than the test statistic
of Theorem 2, and hence can be used as a basis for a conservative testing
procedure. This is summarized in the following corollary:
Corollary 1: Under the null H
0
,
limsup
n
inf
V
0
inf
Core(
0
,P
n
)
sup
uR
d
u

n|(B
u
) (B
u
)| sup
uR
d
u
|G|,
and the inma are achieved providing V
0
is chosen to be closed in the weak
topology.
Given the conservative nature of the procedure based on Corollary 1, it
9
is crutial to assess the power of the test, as described in the next section.
2.3 Power analysis
The two test statistics considered in section 2.2 are the following:
TS1 = inf
V
0
sup
uR
d
u

n |P
n

(B
u
) (B
u
)|
TS2 = inf
V
0
inf
Core(
0
,P
n
)
sup
uR
d
u

n |(B
u
) (B
u
)|.
Note that since P
n

Core(
0
, P
n
), TS2 is dominated by TS1 by construc-
tion.
To assess the power of either test, we consider the following types of local
alternatives:
d
H
(V
n
, Core(
n
, P)) r
1
n
, > 0, (4)
where r
n
is a deterministic sequence of reals diverging with n, and d
H
denotes
Hausdor distance dened as follows: for any two sets V
1
and V
2
in (U),
and any d metrizing weak convergence,
d
H
(V
1
, V
2
) = max
_
sup

1
V
1
inf

2
V
2
d(
1
,
2
), sup

2
V
2
inf

1
V
1
d(
1
,
2
)
_
The principle of both test statistics TS1 and TS2 rests on the set con-
vergence of Core(, P
n
) to Core(, P) for a xed model correspondence .
Hence, for n large enough, Core(
n
, P
n
) is suciently close to Core(
n
, P) for
the test statistic to detect the sequence of local alternatives, as summarized
in the following theorem.
Theorem 3: Under the sequence of alternatives dened in (4) with r
n
=
o(

n), if the family of functions {

(B
u
): with

satisfying (3) and


10
Core(
n
, P), u U, n N}, is P-Donsker, then the test statistics TS1 and
TS2 diverge.
Remark 1: As developed in Appendix A, Theorem 3 has an interesting
interpretation in terms of convergence of empirical random sets: for a random
element Y
j
in Y , (Y
j
) is a random set in U under assumption 1, and its
distribution can be identied with Core(, P). Theorem 3 tells us when the
empirical distribution Core(, P
n
) of the random set (Y ) weakly converges
to the true distribution at rate

n. Such a result appears to be new in the
literature on convergence of random sets.
3 Illustration: a simple entry model
3.1 Single type
Consider a market with two rms producing complementary products with
identical costs.
3
The payo functions are

1
(x
1
, x
2
, u) = (x
2
u)I
{x
1
=1}
,

2
(x
1
, x
2
, u) = (x
1
u)I
{x
2
=1}
,
where x
i
{0, 1} is rm is action, and u is an exogenous cost. The rms
know their cost; the analyst, however, knows only that u [0, 1], and that
the structural parameter is in (0, 1]. There are two Nash equilibria. The
rst is x
1
= x
2
= 0 for all u [0, 1]. The second is x
1
= x
2
= 1 for all
u [0, ] and zero otherwise. Since the two rms actions are perfectly
correlated, we shall denote them by a single binary variable y = x
1
= x
2
.
Hence the model is described by the multi-valued mapping: (1) = [0, ] and
3
Jovanovic (1989) and Tamer (2003) consider this simple game in a similar context.
11
(0) = [0, 1]. If we consider the restriction
max
, then the multi-valued
mapping incorporating the restriction is
0
dened by
0
(1) = [0,
max
] and

0
(0) = [0, 1]. In this case, since y is Bernoulli, we can write P = (1
p, p) with p the probability of a 1. For the distribution of u, we consider a
parametric exponential family on [0, 1]. Hence V = {

:= u
1
du}
>0
, and
the restriction can be chosen as > 0.
Consider the smallest reliability P that can be attached to a set A in U
based on
0
and P, dened by
P(A) = P{y Y |
0
(y) A}.
Our null hypothesis of compatibility of the two sets of restrictions is that for
some [, ],

set-wise dominates P, in other words, that

associates
to each set a measure at least as large as the smallest reliability that can be
attached to it. This is equivalent to the existence of a [, ] such that for
P-almost all y a probability measure

(y, .) supported on
0
(y) such that
for all u [0, 1]
u

=
_
Y

(y, [0, u])P(dy).


In other words,
u

= (1 p)

(0, [0, u]) + p

(1, [0, u])


and

(1, [0,
max
]) =

(0, [0, 1]) = 1


with for P-almost all y,

(y, [0, u]) a nondecreasing, right-continuous func-


tion of u taking values in [0, 1]. When

max
< p, there is no solution (i.e.
12
the two sets of restrictions are incompatible), whereas when

max
p, a
continuous solution is given by

(0, [0, u]) =


u

p
_
u

max
_

1 p
I
{u
max
}
+
u

p
1 p
I
{u>
max
}

(1, [0, u]) =


_
u

max
_

I
{u
max
}
+ I
{u>
max
}
.
Consider now the empirical process G
n
=

n(P
n
P) applied to the
family of functions
,u
(y) :=

(y, [0, u]). Elementary calculations yield


G
n

,u
=

n
p
n
p
1 p
g
,u
,
where
g
,u
:= (1 u

)I
{u
max
}
+
_
_
u

max
_

)
_
I
{u<
max
}
.
Note that
inf

sup
u[0,1]
|g
,u
| = 1

max
.
The class of functions {g
,u
; , u [0, 1]} is a Vapnik-

Cervonekis
class, since the class of subgraphs of these functions are unions of intervals
tied to zero, hence they cannot shatter any set of two points (see for instance
van der Vaart and Wellner (2000)). Hence, the family {
,u
; , u
[0, 1]} is P-Donsker, and

n
p
n
p
1p
weakly converges to a centered normal
random variable with variance p/(1 p), we have the following:
inf

sup
u[0,1]
|G
n

,u
|
_
p
(1 p)
(1

max
) |Z|,
where Z is a random variable with standard normal distribution. Now, by
construction, inf

sup
u[0,1]
|G
n

,u
| is dominated by sup
u[0,1]
|G
n

,u
|
13
which for any satisfying

p (there exists at least one under the null)


converges weakly to the supremum G

of a standard Brownian bridge, with


Pr(G

> x) = 2

j=1
(1)
j+1
e
2j
2
x
2
.
A testing procedure that does not require computation of the kernels ,
as described in section 2.2 consists in nding the element of Core(
0
, P
n
)
that minimizes the Kolmogorov-Smirnov distance to the set of distributions
{u

, }. If

max
p
n
, then u

is a minimizer and the minimum


distance is zero. If

max
< p
n
, then the minimum Kolmogorov-Smirnov
distance is p
n
u

. If

max
> p, ultimately so will p
n
and the test statistic is
zero. If

max
< p, then ultimately so will p
n
, and the test statistic diverges.
Finally, if

max
= p, then the statistic will be

nmax(0, p
n
p). Finally, in
this very simple example, one might consider a the test of the null H
0
: p =

max
against the one-sided alternative H
a
: p >

max
using the fact that
under the null,
_
n

max
(1

max
)
_

1
2
(p
n

max
) converges to a standard
normal random variable.
To summarize, the procedures proposed are based on the following test
statistics:
TS1 = inf

sup
u[0,1]
|G
n

,u
|
TS2 = inf

inf
Core(
0
,P
n
)
sup
u[0,1]

n|u

F

(u)|
TS3 =

n(p
n

max
)
and the following approximating distributions:
14
AD1 =
_
p
(1 p)
(1

max
) |N(0, 1)|
AD2 = G

AD3 =
_

max
(1

max
) N(0, 1)
AD4 =
_
p(1 p) N(0, 1)
The procedure based on estimation of the probability kernels and com-
parison between hypothesized distributions = P

and their empirical


counterparts P
n

would result in comparing TS1 to the quantiles of AD1


(for the exact asymptotic version) or AD2 (for the conservative asymptotic
version). The procedure based on the minimum distance between the hypoth-
esized distributions and the empirical random set distribution Core(
0
, P
n
)
would result in comparing TS2 with the quantiles of AD4 (for an exact as-
ymptotic version) or AD2 (for the conservative asymptotic version). Finally,
the simple test on the boundary would result in comparing TS3 to the quan-
tiles of AD3.
3.2 Heterogeneous types
Consider a market with two rms producing complementary products with
heterogeneous costs. The payo functions are

1
(x
1
, x
2
, u) = (x
2
u
1
)I
{x
1
=1}
,

2
(x
1
, x
2
, u) = (x
1
u
2
)I
{x
2
=1}
,
15
where x
i
{0, 1} is rm is action, and the us are rm specic exogenous
costs. The rms know their cost; the analyst, however, knows only that
u [0, 1]
2
, and that the structural parameter is in (0, 1]. There are two
Nash equilibria. The rst is x
1
= x
2
= 0 for all u [0, 1]
2
. The second
is x
1
= x
2
= 1 for all u [0, ]
2
and zero otherwise. Since the two rms
actions are perfectly correlated, we shall denote them by a single binary
variable y = x
1
= x
2
. Hence the model is described by the multi-valued
mapping: (1) = [0, ]
2
and (0) = [0, 1]
2
. If we consider the restriction

max
, then the multi-valued mapping incorporating the restriction is
0
dened by
0
(1) = [0,
max
]
2
and
0
(0) = [0, 1]
2
. In this case, since y is
Bernoulli, we can write P = (1 p, p) with p the probability of a 1. For
the distribution of u, we consider the costs to be independent with marginals
following the same parametric exponential family on [0, 1]. Hence V = {

:=
u
1
1
u
1
2
du
1
du
2
}
>0
, and the restriction can be chosen as > 0.
A density version of (3) can be derived in this case, and makes for a more
convenient test statistic. Writing u = (u
1
, u
2
)

,
f

(u) = f

(u
1
)f

(u
2
) =
2
u
1
1
u
1
2
=
_
Y

(y, u)P(dy).
In other words,

2
u
1
1
u
1
2
= (1 p)

(0, u) + p

(1, u)
under the constraints
_
[0,1]
2

(0, u)du =
_
[0,
max
]
2

(1, u)du = 1
and

(1, u) = 0 for all u / [0,


max
]
2
.
When
2
max
< p, there is no solution (i.e. the two sets of restrictions are
incompatible), whereas when
2
p, a solution is given by

(0, u) =
f

(u)
1 p
_
1 p
2
max
I
[0,
max
]
2
_
16

(1, u) =
2
max
f

(u) I
[0,
max
]
2.
Consider now the empirical process G
n
=

n(P
n
P) applied to the
family of functions
,u
(y) :=

(y, u). Elementary calculations yield


G
n

,u
=

n
p
n
p
1 p
g
,u
,
where
g
,u
:= f

(u)
_

2
max
I
[0,
max
]
2 1
_
.
In this case, it is convenient to use the L
1
metric, we are looking at the
minimum of
_
[0,1]
2
|G
n

,u
| =

n
|p
n
p|
1 p
_
[0,1]
2
f(u)

2
max
I
[0,
max
]
2 1

du.
Now
_
[0,1]
2
f(u)

2
max
I
[0,
max
]
2 1

du = 2(1
2
max
)
which is minimized at = to yield 2(1
2
max
). So
inf

_
[0,1]
2
|G
n

,u
| du 2
_
p
1 p
(1
2
max
) |Z|.
where Z is a standard normal random variable.
Appendix A: Empirical Distributions of Random Sets
In assumption 1, we assume that the correspondence is measurable in the
traditional sense, dened below:
Denition A1 (Eros Measurability) A correspondence : (Y, B
Y
)
(U, B
U
) is said to be Eros measurable, or weakly measurable, or simply
17
measurable, if the inverse image of open sets is measurable, i.e. if for all
open subsets O of U,

1
(O) = {y Y | (y) O = } B
Y
.
There are several ways a measurable correspondence can convey proba-
bilistic information on its image space (U, B
U
) given observed frequencies of
outcomes in Y .
Dempster (1967) suggests to consider the smallest reliability that can be
associated with the event A B
U
as the belief function
P(A) = P{y Y | (y) A}
and the largest plausibility that can be associated with the event A as the
plausibility function
P(A) = P{y Y | (y) A = }
the two being linked by the relation
P(A) = 1 P(A
c
), (5)
which prompted some authors to call them conjugates or dual of each other.
A natural way to construct a set of probability measures is to consider
all probability measures that dominate the set function P set-wise, forming
thus the core of the belief function:
Core(, P) = { M(U) | A B
U
, (A) P(A)}
= { M(U) | A B
U
, (A) P(A)}
where the rst equality can be taken as a denition, and the second follows
immediately from (5). It is well known that Core(, P) is non-empty, and it
will be shown as a consequence of (3.2) below.
18
A dierent way of dening probabilistic information generated by the
correspondence can be derived from Aumanns idea (in Aumann (1965))
of considering correspondences as bundles of their selections.
Dene the domain of the correspondence by
Dom() = {y Y | (y) = }.
A measurable selection of the measurable correspondence is dened by the
property below:
Denition A2 (Measurable Selection) A measurable selection of corre-
spondence : (Y, B
Y
) (U, B
U
) is a (B
Y
, B
U
)-measurable function such
that (y) (y) for all y Dom().
The set of measurable selections of a measurable correspondence is
denoted Sel(), and it is non-empty by a theorem due to Rokhlin, (Rokhlin
(1949) Part I, 2, N
o
9, Lemma 2) and generally attributed to Kuratowski
and Ryll-Nardzewski:
Theorem A1 (Rokhlin) An Eros measurable correspondence with
closed non-empty values admits a measurable selection. For a proof, see
for instance Theorem 8.1.3 page 308 of Aubin and Frankowska (1990).
Elements of Sel() can be used to transport the probability P on Y to
probabilities on U. For each Sel(), consider the probability dened
on each A B
U
by
(A) = P{y Y | (y) A} = P
1
(A),
and dene
(, P) = { M(U), = P
1
some Sel()}.
19
It is easily seen that (, P) Core(, P). A converse is given by the
following theorem of Castaldo, Maccheroni, and Marinacci (2004):
Theorem A2 (Castaldo, Maccheroni and Marinacci) If is measur-
able and compact-valued, then Core(, P) is the weak closed convex hull of
(, P).
We now develop the claim made in remark 1 of Theorem 3. and P
dene a random set with realizations (Y
j
) for realizations Y
j
from P. P is
the distribution of the random set, and the empirical distribution associated
with a sample (Y
1
, . . . , Y
n
) is given by
P(A) =
1
n
n

j=1
I
{(Y
j
)A=}
.
Core(, P) characterizes P and Core(, P
n
) characterizes P
n
. Hence The-
orem 1 and Donsker theorems provide a way to derive conditions for weak
convergence of empirical random sets at rate

n.
Appendix B: Proofs
Proof of Theorem 1:
Call (B) the set of all Borel probability measures with support B. Under
Assumption 1, the map y (
0
(y)) is a map from Y to the set of all
non-empty convex sets of Borel probability measures on U which are closed
with respect to the weak topology. Moreover, for any f C
b
(U), the set of
all continuous bounded real functions on U, the map
y sup
__
fd : (
0
(y))
_
= max
u
0
(y)
f(u)
is B
Y
-measurable, so that, by Theorem 3 of Strassen (1965), for a given
(U), there exists satisfying (3) with (y, .) (
0
(y)) for P-almost
20
all y if and only if
_
U
f(u)(du)
_
U
sup
u
0
(y)
f(u)P(dy) (6)
for all f C
b
(U). Now, dening P as the set function
P : B P({y Y :
0
(y) B = }),
the right-hand side of (6) is shown in the following sequence of equalities to
be equal to the integral of f with respect to P in the sense of Choquet (line
(7) below can be taken as a denition).
_
Y
sup
u
0
(y)
{f(u)} dP(y)
=
_

0
P{y Y : sup
u
0
(y)
{f(u)} x} dx
+
_
0

(P{y Y : sup
u
0
(y)
{f(u)} x} 1) dx
=
_

0
P{y Y :
0
(y) {f x}} dx
+
_
0

(P{y Y :
0
(y) {f x}} 1) dx
=
_

0
P({f x}) dx +
_
0

(P({f x}) 1) dx =
_
Ch
f dP. (7)
By Theorem 1 of Castaldo, Maccheroni, and Marinacci (2004), for any
f C
b
(U),
_
Ch
f dP = max
Sel(
0
)
_
U
f(u)P
1
(du),
so that (6) is equivalent to
max
Sel(
0
)
_
U
f(u)P
1
(du)
_
U
f(u)(du) (8)
21
for any f C
b
(U). If is in the weak closure of the set of convex combina-
tions of elements of {P
1
: Sel(
0
)}, then by linearity of the integral
and the denition of weak convergence, (8) holds. Conversely, if satises
(8), then it satises
_
Ch
f dP
_
U
f(u)(du)
and by monotone continuity, we have for all A B
U
, and I
A
the indicator
function,
_
U
I
A
(u)(du)
_
Ch
I
A
dP.
Hence (A) P(A) for all A B
U
, which by Corollary 1 of Castaldo, Mac-
cheroni, and Marinacci (2004) implies that is the weak limit of a sequence
of convex combinations of elements of {P
1
: Sel(
0
)}, hence it is a
mixture in the desired sense and the proof is complete.
Proof of Theorem 2:
Under the null hypothesis, V
00
is non-empty. Consider V
00
and a solution

to equation (3) (which exists by denition of V


00
). Fix > 0 and consider
a grid of points
= u
0
< u
1
< . . . < u
m
=
dened such that ((u
j1
, u
j
])
2
. For all u (u
j1
, u
j
], we have
u
j1
,

u,
(y)
u
j
,
for all y Y , so that
_

u
j1
,
,
u
j
,

is a bracket of functions
in the family = {
u,
: u R}. The L
1
(P) size of this bracket is
P

u
j
,

u
j1
,

= P
_

u
j
,

u
j1
,
_
= ((u
j1
, u
j
])
2
.
22
Since the functions in the family are probability kernels, we have for all
u R, 0
u,
(y) 1, all y Y , so that
P
_

u
j
,

u
j1
,
_
2
P

u
j
,

u
j1
,


2
and the L
2
(P) size of the bracket is at most . The minimum number of
brackets of L
2
(P) size less than needed to cover the whole family is de-
noted N
[ ]
(, , L
2
(P)) and the logarithm of this quantity is the entropy with
bracketing. Here the entropy with bracketing is less than log(m) and given
that has a density with respect to Lebesgue measure, m can be chosen
smaller than (2/
2
), so that the entropy with bracketing condition
_

0
_
log N
[ ]
(, , L
2
(P))d <
is satised and the empirical process G
n
converges weakly to a centered
Gaussian process G

with covariance structure dened by


EG
P

u,
G
P

v,
= P
u,

v,
P
u,
P
v,
,
uniformly over . Let || ||

denote the norm of the supremum. By the


continuous mapping theorem, || G
n

u,
||

weakly converges to || G
P

u,
||

.
Notice that
G

u,
= G

((, u]) = B
0
(F

(u))
where B
0
is a version of the standard Brownian bridge, and F

is the distri-
bution function associated with the distribution . Since F

is a continuous
distribution function, || B
0
F

||

= || B
0
||

. It is well known (see for in-


stance Dudley (2003) proposition 12.3.4 page 464) that the distribution of
|| B
0
||

(the Kolmogorov-Smirnov statistic) is dened by


Pr(|| B
0
||

> u) = 2

j=1
(1)
j1
e
2j
2
u
2
,
and the proof is complete.
23
Proof of Theorem 3
The assumption guarantees that G
n
=

n(P
n
P) converges to a Brownian
Bridge uniformely over the family

dened in the statement of the theorem.


Hence,
sup
Core(
n
,P)
sup
uR
d
u
|G
n

(B
u
)|
converges to a bounded random variable. Since the Kolmogorov-Smirnov
metric is stronger than d which metrizes weak convergence, the latter display
implies that
sup
Core(
n
,P)
sup

Core(
n
,P)
d(,

) = O(1/

n).
Now,
TS2 d
H
(V
n
, Core(
n
, P
n
)) sup
Core(
n
,P)
sup

Core(
n
,P)
d(,

)
and TS2 TS1, so the result follows.
References
Andrews, D., S. Berry, and P. Jia (2004): Condence Regions for Pa-
rameters in Discrete Games with Multiple Equilibria, with an Application
to Discount Chain Store Location, unpublished manuscript.
Aubin, J.-P., and H. Frankowska (1990): Set-valued analysis. Boston:
Birkhauser.
Aumann, R. (1965): Integrals of set-valued functions, Journal of Mathe-
matical Analysis and Applications, 12, 112.
24
Castaldo, A., F. Maccheroni, and M. Marinacci (2004): Random
sets and their distributions, Sankhya (Series A), 66, 409427.
Chernozhukov, V., H. Hong, and E. Tamer (2002): Inference on
Parameter Sets in Econometric Models, unpublished manuscript.
Dempster, A. P. (1967): Upper and lower probabilities induced by a
multi-valued mapping, Annals of Mathematical Statistics, 38, 325339.
Dudley, R. (2003): Real Analysis and Probability. Cambridge University
Press.
Jovanovic, B. (1989): Observable implications of models with multiple
equilibria, Econometrica, 57, 14311437.
Pakes, A., J. Porter, K. Ho, and J. Ishii (2004): Moment Inequalities
and Their Application, unpublished manuscript.
Rockafellar, R. T., and R. J.-B. Wets (1998): Variational Analysis.
Springer.
Rokhlin, V. (1949): Selected topics from the metric theory of dynam-
ical systems, Uspekhi Matematicheskikh Nauk, 4, 57128, translated in
American Mathematical Society Transactions 49(1966), 171-240.
Shaikh, A. (2005): Inference for a Class of Partially Identied Econometric
Models, unpublished manuscript.
Strassen, V. (1965): The existence of probability measures with given
marginals, Journal of Mathematical Statistics, 36, 423439.
Tamer, E. (2003): Incomplete Simultaneous Discrete Response Model with
Multiple Equilibria, Review of Economic Studies, 70, 147165.
25
van der Vaart, A., and J. Wellner (2000): Weak Convergence and
Empirical Pocesses. Springer.
26

You might also like