Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Statistics & Probability Letters 42 (1999) 257 – 266

Stochastic monotonicity properties of Bayes estimation


of the population size for capture–recapture data
O. Yoshida, J.G. Leite ∗ , H. Bolfarine
Departamento de Estatstica,
à Universidade de São Paulo, Caixa Postal 66281, 05315-970, São Paulo, S.P., Brazil

Received July 1997; received in revised form March 1998

Abstract
Under arbitrary prior distributions for the population size k and for capture probabilities, monotonicity properties of
the posterior distribution of k are obtained to explain the role of the sample and the role of the prior opinion about the
capture probabilities in the Bayesian estimation of k. c 1999 Elsevier Science B.V. All rights reserved

Keywords: Species problem; Number of classes; Multiple Bernoulli sample

1. Introduction

We consider a capture–recapture experiment with m independent trapping occasions from a closed population
with unknown size k. We assume that k has an arbitrary prior distribution , and that given k the capture
probabilities among individuals, namely q1 ; : : : ; qk , are independent and identically distributed with arbitrary
distribution F. The data consists of capture frequencies a1 ; : : : ; am , where ar is the number of individuals
captured exactly r times,
Pr = 1; : : : ; m. The posterior distribution of k depends on F and m and depends on the
data only through w = r¿1 ar , the number of distinct individuals captured in the m occasions. In this paper
we study features of Bayesian estimation of k by showing that the posterior distribution of k is monotonic in
F; m and w.
The most commonly used estimators for k are the jackknife (Burnham and Overton, 1978, 1979) estimator
and the moment (Chao, 1987) estimator. The magnitude of the capture probabilities seems to play a funda-
mental role in the performance of these estimators. Otis et al. (1978) observe that the jackknife estimator
generally produces adequate estimates if many individuals are caught a relatively large number of times (which
is more likely to occur if most of the capture probabilities are relatively large). Chao (1987) observes that
the jackknife estimator usually underestimates the population size k if many individuals are caught only once
or twice in the m occasions, and then proposes a moment estimator based on low-order capture frequencies
(which is more likely to occur if the capture probabilities are small). The role of the magnitude of the capture
probabilities under the Bayesian view is played by the magnitude of the prior distribution F. We prove that

∗ Corresponding author.

0167-7152/99/$ – see front matter c 1999 Elsevier Science B.V. All rights reserved
PII: S 0 1 6 7 - 7 1 5 2 ( 9 8 ) 0 0 2 0 8 - 9
258 O. Yoshida et al. / Statistics & Probability Letters 42 (1999) 257 – 266

the smaller F is (in the sense of stochastic ordering) the larger will be the posterior distribution of k (in
the sense of stochastic ordering), which means the smaller the capture probabilities are the larger will be the
estimates for k. Similar results are reported in Lewins and Joanes (1984). They consider a Bayesian approach
for estimating the number of species and discuss the e ect of the prior knowledge of the species abundances
on the model: they ÿnd that the smaller the relative species abundances are the larger will be the estimates
for the number of species. Their result is empirical and is supported by the analysis of the Mount Kenya data.
In Section 2 we describe our model and we show that for all F and  (possibly improper) the posterior
moments of k are all ÿnite; we also describe the Lewins and Joanes’ model and their results. Section 3
presents explicit expressions for the posterior distribution of k under the improper uniform prior for k and
also for prior distributions in the exponential family. We present some Bayesian estimates of k using a real
data set and we observe that the posterior mean of k decreases with the prior mean of capture probabilities.
This is formally considered in Section 4 where monotonicity properties of the posterior distribution of k are
proved. Section 4 also reports some results on how the posterior distribution of k is a ected by the sample
information, through m and w.

2. The model

Let us consider a population with unknown size k and a sequential capture sampling on m occasions. In
each occasion the individuals are captured, independently of the others, with probabilities q1 ; : : : ; qk . Each
captured individual is returned to the population, so, the individual capture probabilities on di erent occasions
are all the same. Given k, let Xi (m) be the number of times individual i is captured in the m occasions,
i = 1; : : : ; k, then X1 (m); : : : ; Xk (m) are independent, and Xi (m) has binomial distribution with parameters m
and qi ; i = 1; : : : ; k. For m ∈ N∗ , the set of positive (non-null) integers, we deÿne the capture frequencies
Pk
Ar (m) = i=1 I(Xi (m)=r) ; dr = 0; : : : ; m. That is, Ar (m) is the number of individuals captured exactly r times
Pk Pk P
in the m occasions. Since k = i=1 I(Xi (m)=0) + i=1 I(Xi (m)¿0) , we have that k = A0 (m) + r¿1 Ar (m), where
Pk P
A0 (m) = i=1 I(Xi (m)=0) is the number of individuals not captured in the m occasions and r¿1 Ar (m) =
Pk
i=1 I(Xi (m)¿0) is the number of individuals P captured at least one time in the m occasions. Observe that
A0 (m) = k − W (m), where W (m) = r¿1 Ar (m). Given k ∈ N∗ ; q1 ; : : : ; qk ; qi ∈ [0;P 1], and i = 1; : : : ; k, the
distribution of Ar (m); r = 1; : : : ; m is, for all a1 ; : : : ; am ∈ N = N∗ ∪ {0}, such that r¿1 ar 6k,

X k 
Y 
m
P[A1 (m) = a1 ; : : : ; Am (m) = am |k; q1 ; q2 ; : : : ; qk ] = qixi (1 − qi )m−xi ; (1)
xi
(x1 ;:::; xk ) ∈ Y i=1

where
X
Y = {(y1 ; : : : ; yk ) ∈ Nk ; 1{r} (yi ) = ar ; r = 1; : : : ; m}:

Assuming that given k; q1 ; : : : ; qk , are independent and identically distributed random variables with distri-
bution F, it follows that, given A1 (m) = a1 ; : : : ; Am (m) = am ; the likelihood function of k is
 Y
m  
k ar k
La1 ;:::; am (k) = [qr (F; m)] ˙ (q0 (F; m))k ; k¿w; (2)
a0 ; a1 ; : : : ; am w
r=0
  R1  P
where a0 ; a1k; :::;am is the cardinality of Y; qr (F; m) = 0 mr qr (1 − q)m−r dF(q); r = 0; : : : ; m; w = r¿1 ar is
the number of distinct individuals captured in the m occasions, and a0P = k − w is the number of non-captured
individuals. Thus, the assumption that F is known implies that w = r¿1 ar , is sucient for k so that the
Bayesian estimation of k depends on the data only through w. Further we assign a prior distribution  (possibly
O. Yoshida et al. / Statistics & Probability Letters 42 (1999) 257 – 266 259

improper) to k and hence the posterior probability function for k given a1 ; : : : ; am is


 
k
(q0 (F; m))k (k)
w
(k|F; m; a1 ; : : : ; am ) = (k|F; m; w) = ; k¿w; (3)
C(F; m; w)
where
Xj 
C(F; m; w) = (q0 (F; m))j (j) (4)
w
j¿w

and
Z 1
q0 (F; m) = (1 − q)m dF(q): (5)
0

We note that the posterior in (3) requires  and w such that A;w = {n ∈ N: n¿w and (n) ¿ 0} = 6 . An
important property of the posterior distribution in (3) is that its moments are all ÿnite. For all real distribution
functions F on the interval [0; 1] and for all prior distributions  (proper or improper on N∗ ) such that
A;w 6= ∅, let Mr (F; m; w) be the rth posterior moment of k given W (m) = w. Thus,
 
P r n
X n¿w n (q0 (F; m))n (n)
r w
Mr (F; m; w) = n (n; F; m; w) = : (6)
n¿w
C(F; m; w)

To see that the rth moment is ÿnite, notice from (6) that
 
P∞ n+r
((w + r)!=w!) n=w (q0 (F; m))n
w+r
Mr (F; m; w) 6
C(F; m; w)

((w + r)!=w!)(q0 (F; m))w (1 − q0 (F; m))−w−r−1


= ¡ ∞:
C(F; m; w)
Lewins and Joanes (1984) consider a similar model for estimating the number of species in a population.
In their model, the data a1 ; a2 ; : : : are generated from a multinomial population, with k categories (species)
Pk
and capture probabilities (relative species abundances) q1 ; : : : ; qk ; with i=1 qi = 1. Thus in place of (1), one
obtains
X  n
Y k
P[A1 (n) = a1 ; A2 (n) = a2 ; : : : =k; q1 ; : : : ; qk ] = qixi ; (7)
x1 ; : : : ; xk
(x1 ;:::; xk ) ∈ Y i=1

where n is the sample size. The prior distribution for q1 ; : : : ; qk , given k, is considered to be the symmetric
Dirichlet distribution with parameter . The value of represents the prior knowledge of the species relative
abundances. Then, in place of (2), one obtains
 
k (k )
La1 ;:::; am (k) ˙ ; (8)
w (k + n)
and, as in (2) the likelihood of k depends on the data only through w the number of distinct species in the
sample. They adopt the zero-truncated negative binomial distribution with parameters 0 ¡ p ¡ 1 and r¿1 for
the prior of k, and then, the posterior distribution of k depends on ; w and n, and instead of (3) they obtain
   
k (k ) k +r−1
(k| ; n; a1 ; : : : ; am ) = (k| ; n; w) ˙ (1 − p)k ; (9)
w (k + n) k
260 O. Yoshida et al. / Statistics & Probability Letters 42 (1999) 257 – 266

k¿w. Lewins and Joanes use this Bayesian approach to estimate the number of insect species from the
Mount Kenya data, and discuss the e ect of the prior knowledge on the model. Their calculations show that
the posterior distribution of k is very sensitive to the choice of ; in fact they conclude that “variations in
clearly have the largest e ect on the model”. In addition, they ÿnd that “the model appears to be very
robust for p and r”. This kind of sensitivity and robustness is also present in the Bayesian capture–recapture
model. We note from the cotton tail data analysis, presented in the next section, that variations in the capture
probabilities have a strong e ect on the behavior of the posterior mean of k, while variations in the prior
distribution for k have little e ect on it.

3. Estimating k

In this section, closed form expressions for the posterior distributions for k are obtained by considering
di erent prior distributions, including the uniform improper prior and exponential family members.

Proposition 3.1. Let  be a uniform prior on N∗ , that is, (k) = 1; k ∈ N∗ and F any probability function
distribution on the interval [0; 1]. Thus, for all w ∈ N∗ ,
 
k
(k|F; m; w) = (1 − q0 (F; m))w+1 (q0 (F; m))(k−w) ;
w

k¿w, that is, the posterior distribution of k given W (m)=w is the distribution of the random variable w+k 0 ,
where the random variable k 0 has a negative binomial distribution with parameters w + 1 and 1 − q0 (F; m).

Proof. The proof follows directly from (4) and (3) since

Xk 
C(F; m; w) = (q0 (F; m))k
w
k¿w

X  −w − 1 
w
= (q0 (F; m)) (−q0 (F; m))r = (q0 (F; m))w (1 − q0 (F; m))−w−1 :
r
r¿0

Corollary 3.1. Let  be the improper uniform prior on N∗ and let F be any distribution function on [0; 1].
Thus,
(i) for all w ∈ N∗ ,

w+1 q0 (F; m)
E[k|F; m; w] = w + and Var[k|F; m; w] = (w + 1) ;
(q0 (F; m))−1 − 1 (1 − q0 (F; m))2

(ii) if w=(1 − q0 (F; m)) 6∈ N∗ , then the posterior mode of k is given by

[w=(1 − q0 (F; m))];

where [x] denotes the greatest integer less than or equal to x. Otherwise, the posterior modes of k are given
by w=(1 − q0 (F; m)) − 1 and w=(1 − q0 (F; m)).
O. Yoshida et al. / Statistics & Probability Letters 42 (1999) 257 – 266 261

Proof. Notice that (i) follows directly from Theorem 3:1. To prove (ii) notice that
(k|F; m; w) w
¿1 if k ¡ ;
(k − 1|F; m; w) 1 − q0 (F; m)
(k|F; m; w) w
=1 if k = ;
(k − 1|F; m; w) 1 − q0 (F; m)
(k|F; m; w) w
¡1 if k ¿ ;
(k − 1|F; m; w) 1 − q0 (F; m)
for all k ¿ w + 1, which after some algebraic manipulations yields the result.

Proposition 3.2. Let F be any distribution function on [0; 1] and consider the prior
∗ (k) = c(a)S(k) exp {d(a)T (k)};
k ∈ N, which is a member of the exponential family where a is any real number, c and d are real functions
and S and T are integer positive functions. Let {(k); k ∈ N∗ } be the prior probability function deÿned by
Pk0 ∗
(k) = ∗ (k)=1 − i=0  (i); with k ∈ {k0 + 1; k0 + 2; : : :}; k0 ∈ N and if w ∈ N then A;w 6= ∅. Thus the pos-
terior distribution of k given W (m) = w is the distribution of a random variable w + k 0 where the distribution
of the random variable k 0 ; 0 , is also an element of the exponential family of distributions. That is,
0 (k 0 ) = c10 (a; b)c20 (a; b)S10 (k 0 )S20 (k 0 )exp{d01 (a; b)T10 (k 0 ) + d02 (a; b)T20 (k 0 )};
with k 0 ∈ {x − w; x ∈ A; w }, where b = q0 (F; m),
 −1
X k 
c1 (a; b) = c(a)
0
b S(k)exp{d(a)T (k)} ;
k
w
k ∈ A;w

c20 (a; b) = c(a); S10 (k 0 ) = S(k 0 + w); d01 (a; b) = d(a);


 
k0 + w
S20 (k 0 ) = ; d02 (a; b) = log b; T10 (k 0 ) = T (k 0 + w)
w
and T20 (k 0 ) = k 0 + w.

Proof. The proof is a direct consequence of (3).


 
Example 3.1. As considered in Lewins and Joanes (1984), suppose that ∗ (k) = r+k−1 k pr (1 − p)k ;
N∗ , with
k ∈ N; p ∈ (0; 1); r ∈  ∗
 k0 = 0. That is,  is the probability function of a negative binomial random
r+k−1
variable. Thus, (k) = k pr (1 − p)k =(1 − pr ); k ∈ N∗ and the posterior distribution of k given W (m) =
m is the distribution of the random variable w + k 0 , where the probability function of k 0 ; 0 , is
 
r + k0 + w − 1 0
0 (k 0 ) = (1 − (1 − p)q0 (F; m))r+w ((1 − p)q0 (F; m))k ;
k0
k 0 ∈ N. That is, k 0 is distributed according to the negative binomial distribution with parameters r + w and
1 − (1 − p)q0 (F; m), which implies that
(r + w)(1 − p)q0 (F; m) (r + w)(1 − p)q0 (F; m)
E[k|F; m; w] = w + and Var[k|F; m; w] = :
1 − (1 − p)q0 (F; m) (1 − (1 − p)q0 (F; m))2
262 O. Yoshida et al. / Statistics & Probability Letters 42 (1999) 257 – 266

Table 1
Posterior mean and 95% credible interval of k given and

Uniform Poisson Negative binomial



q0 ( ; ; m) on N∗  = 135 r = 134; p = 0:5 r = 1206; p = 0:9
+

0.1 1.9 0.05 0.77 337(275,408) 180(161,201) 207(180,237) 183(162,205)


0.3 1.7 0.15 0.45 139(120,162) 137(122,153) 137(120,155) 136(121,152)
0.5 1.5 0.25 0.26 102(92,115) 111(100,123) 107(96,120) 110(99,122)
0.7 1.3 0.35 0.14 89(82,97) 95(87,104) 92(84,100) 94(86,103)
0.9 1.1 0.45 0.07 82(78,88) 86(80,93) 84(79,90) 85(80,91)
1.1 0.9 0.55 0.04 79(76,83) 81(77,86) 80(77,85) 81(77,86)
1.3 0.7 0.65 0.02 77(76,80) 78(76,82) 78(76,81) 79(76,82)
1.5 0.5 0.75 0.01 77(76,78) 77(76,79) 77(76,79) 77(76,80)
1.7 0.3 0.85 0.00 76(76,77) 76(76,78) 76(76,76) 76(76,76)
1.9 0.1 0.95 0.00 76(76,77) 76(76,77) 76(76,76) 76(76,76)

Example 3.2. Suppose now that ∗ (k) = e− k =k!; k ∈ N;  ¿ 0. That is, ∗ is the probability function of a
Poisson distribution with parameter  ¿ 0. Let us suppose that k0 =0. Thus, (k)=e− k =(k!(1−e− )); k ∈ N∗ ;
is the posterior probability function of k given W (m) = w; w ∈ N∗ , which can be seen as the posterior
probability function of the random variable w + k 0 , where the distribution of the random variable k 0 is Poisson
with parameter q0 (F; m). This implies that
E[k|F; m; w] = w + q0 (F; m) and Var[k|F; m; w] = q0 (F; m):

Example 3.3. In the following, we consider the cotton tail data set presented in Edwards and Eberhardt
(1967), who conducted a 18 trapping occasion study on a conÿned population of known size 135. The
observed capture frequencies from a1 to a7 were 43 16 8 6 0 2 1; capture frequencies a8 to a18 were all zero.
Observe that the number of distinct individuals captured is w = 76. For this sample, (Burnham and Overton,
1978, 1979) obtain the jackknife estimate as 159 with a 95% conÿdence interval given by (116; 202) and,
further, they recommend an improved estimate of 142 with a 95% conÿdence interval given by (112; 172).
Using moment estimators, Chao (1987) estimates the population size as 136 with a 95% interval given by
(87; 185). We illustrate the Bayesian estimation on the above data set considering the distribution of the
capture probabilities F as the beta distribution with parameters and , so that
m 
Y 
+j
q0 (F; m) = q0 ( ; ; m) = : (10)
+ +j
j=1

We adopt four di erent prior distributions for k: the improper uniform distribution on the positive integers
N∗ , the Poisson distribution with parameter  = 135 and the negative binomial distribution with parameters
r = 134; p = 0:5 and r = 1206; p = 0:9. The posterior distributions of k under these priors are calculated using,
respectively, Proposition 3.1, Example 3.2 and Example 3.1. Table 1 presents the posterior mean and the 95%
credible interval of the posterior distribution of k obtained from the 2:5% and 97:5% quantiles. The results
are presented for various values of ( ; ). We note that: (1) for each prior , as the prior expected individual
capture probabilities =( + ) increase, the posterior mean of the number of individuals decreases (a formal
justiÿcation for this result is given in the next section), and (2) for high and medium values of =( + ) the
posterior means for k do not change signiÿcantly across the choices of  while for low values of =( + )
the posterior means for k are very sensitive to the di erent choices of . We could think that the sensitivity of
the posterior mean of k for di erent priors  is higher when the capture probabilities are low, but as seen from
O. Yoshida et al. / Statistics & Probability Letters 42 (1999) 257 – 266 263

Table 2
Posterior mean and 95% credible interval of k given the prior conviction A

Uniform Poisson Negative binomial


A q0 (A; m) on N∗  = 135 r = 134; p = 0:5 r = 1206; p = 0:9
1 0.8084 401(325,486) 185(165,206) 218(189,250) 189(168,211)
10 0.5219 160(136,188) 146(130,163) 150(131,171) 147(130,164)
20 0.4480 138(119,161) 136(122,152) 137(120,155) 136(121,152)
30 0.4152 131(113,151) 132(118,147) 131(115,148) 132(117,147)
40 0.3966 127(110,146) 130(116,144) 128(113,144) 129(115,144)
50 0.3846 124(108,142) 128(114,142) 126(111,142) 127(114,142)
60 0.3762 122(107,140) 127(113,141) 125(117,140) 126(113,141)
70 0.3700 121(106,139) 126(113,140) 124(109,139) 125(112,140)
80 0.3652 120(105,138) 125(112,140) 123(109,138) 125(111,139)
90 0.3614 120(104,137) 125(112,139) 122(108,138) 124(111,138)
100 0.3583 119(104,136) 124(111,138) 122(108,137) 124(110,138)

Table 2, even with low capture probabilities, the estimates given by the posterior mean do not change very
much if the strength of the prior conviction about the capture probabilities is relatively strong. The capture
probabilities for the cotton tail data were estimated in Edwards and Eberhardt (1967, p. 93) as 0:06. Using this
information and supposing that and are such that = A × 0:06 and = A × (1 − 0:06); A = 1; 10; 20; : : : ; 100,
it follows that =( + ) = 0:06 and
Ym  
A(1 − 0:06) + j
q0 (F; m) = q0 (A; m) = : (11)
A+j
j=1

The quantity A can be interpreted as a measure of the strength of the prior conviction that the capture
probabilities are distributed according to the beta distribution with mean 0:06. Table 2 lists posterior means
and credibility intervals of (k|A; m; w) for A = 1; 10; 20; : : : ; 100. Note that the stronger is the prior conviction
A the smaller will be the estimates of k (this is formally explained in Proposition 4.2).
The conclusion from this analysis is similar to the one in Lewins and Joanes. The model is strongly a ected
by variations on the prior of the capture probabilities and it appears to be very robust for di erent choices
of . The prior knowledge of the capture probabilities is represented by F; in the next section we show that,
whatever the choice of F is, the model is sensitive to variations on F. Indeed, we prove that the posterior
distribution of k is nonincreasing on F.

4. Monotonicity properties of the posterior distribution of k

In this section, we provide formal justiÿcation for the empirical ÿndings reported in Example 3.3. We start
with the following

Deÿnition 4.1. If X and Y are random variables with distributions F and G, respectively, then we say that
X is stochastically larger than Y , and write X ¿st Y (or F¿st G), if

P[X ¿ x]¿P[Y ¿ x] for all x:

Stochastic ordering relations are used below to show how the posterior distribution of k is a ected by the
sample and by the prior opinion about the capture probabilities.
264 O. Yoshida et al. / Statistics & Probability Letters 42 (1999) 257 – 266

Proposition 4.1. Let  be any proper prior distribution for k, or let  be the improper uniform distribution
on N∗ .
(i) For any distribution functions F and G; m ∈ N∗ and w ∈ N such that A; w 6= 0; F6st G implies that
P[k ¿ x|F; m; w]¿P[k ¿ x|G; m; w]; −∞ ¡ x ¡ ∞:
(ii) For any distribution function F; m ∈ N∗ and w ∈ N such that A;w+1 6= 0,
P[k ¿ x|F; m; w]6P[k ¿ x|F; m; w + 1]; −∞ ¡ x ¡ ∞:

(iii) For any distribution function F; m ∈ N and w ∈ N such that A;w 6= ∅,
P[k ¿ x|F; m + 1; w]6P[k ¿ x|F; m; w]; −∞ ¡ x ¡ ∞:

Proof. Let x ∈ R , the set of the real numbers.


(i) Given m; w and F and G such that F6st G, let g and h be real functions deÿned by
 j
q0 (F; m)
g(j) = I(x;∞) (j) and h(j) =
q0 (G; m)
for all j ∈ N∗ . Moreover,
Z ∞ Z ∞
F6st G ⇔ f(x) dF(x)¿ f(x) dG(x);
0 0
for any nonincreasing function f. Thus, it follows that
Z 1 Z 1
(1 − x)m dF(x)¿ (1 − x)m dG(x)
0 0
and
R1
q0 (F; m) (1 − x)m dF(x)
= R 01 ¿1;
q0 (G; m) (1 − x)m dG(x)
0
which implies that the function h is nondecreasing. Since g also is nondecreasing, it follows that
E[g(k)h(k)|G; m; w]¿E[g(k)|G; m; w]E[h(k)|G; m; w]; (12)

C(F; m; w)
E[g(k)h(k)|G; m; w] = E[I (k ¿ x)|F; m; w]; (13)
C(G; m; w)

E[g(k)|G; m; w] = E[I (k ¿ x)|; G; m; w] (14)


and
C(F; m; w)
E[h(k)|G; m; w] = : (15)
C(G; m; w)
The result then follows by combining (12) – (15).
(ii) Given F; m and w, let g and h be functions such that
g(n) = I(x; ∞) (n) and h(n) = (n − w)I[w; ∞) (n);
n ∈ N. Thus, it follows that
C(F; m; w + 1)
Cov[g(k); h(k)|F; m; w] = (w + 1) {E[I(k¿x) |F; m; w + 1] − E[I(k¿x) |F; m; w]};
C(f; m; w)
which is positive deÿnite since g and h are real nondecreasing functions, which proves the result.
O. Yoshida et al. / Statistics & Probability Letters 42 (1999) 257 – 266 265

(iii) Given F; m and w, let g and h be real functions deÿned by


 n R 1
q0 (F; m + 1) (1 − x)m+1 dF(x)
g(n) = = R0 1 and h(n) = I(x; ∞) (n);
q0 (F; m) (1 − x)m dF(x)
0

n ∈ N . Since g and h are nonincreasing functions, it follows that
E[g(k)h(k)|F; m; w]E[h(k)|F; m; w]¿E[g(k)h(k)|F; m; w]: (16)
Moreover, some algebraic manipulations yield
C(F; m + 1; w)
E[g(k)|F; m; w] =
C(F; m; w)
and
C(F; m + 1; w)
E[g(k)h(k)|F; m; w] = E[I(k¿x) |F; m + 1; w];
C(F; m; w)
which together with (16) imply that
E[I(k¿x) |F; m + 1; w]6E[I(k¿x) |F; m + 1; w];
which concludes the proof.

As consequences of the above results, we have that the rth posterior moment, Mr (F; m; w), follows a similar
ordering.

Corollary 4.1. Let  be any proper distribution for k, or let  be the improper uniform distribution on N∗ .
(i) For any distribution functions F and G; m ∈ N∗ ; w ∈ N such that A;w 6= ∅; F6st G implies that
Mr (F; m; w)¿Mr (G; m; w):
(ii) For any distribution function F; m ∈ N∗ and w ∈ N such that A; w 6= ∅,
Mr (F; m; w)6Mr (F; m; w + 1):
(iii) For any distribution function F; m ∈ N such that A; w 6= ∅;
Mr (F; m + 1; w)6Mr (F; m; w):

Proof. Notice that F6st G implies that k|F; m; w¿st k|G; m; w, which implies that f(k)|F; m; w¿st f(k)|G; m; w
for any increasing function f. Thus, (i) can be proved by taking f(j) = j r ; j ∈ N∗ . In a similar fashion, (ii)
and (iii) can be proved.

Suppose that the distribution F is a beta distribution with parameters and , and hence q0 (F; m) is given
by (10). Let us denote by Mr ( ; ; m; w) the rth posterior moment. In this situation we have

Corollary 4.2. Let  be any proper distribution for k or let  be the improper uniform distribution on N∗ .
Thus, for all 1 ¿ 0; 1 ; 2 ¿ 0; 2 ; m ∈ N∗ and w ∈ N, such that A; w =
6 ∅; 1 6 2 and 2 ¿ 2 , imply that
P[k ¿ x| 1 ; 1 ; m; w]¿P[k ¿ x| 2 ; 2 ; m; w]; −∞ ¡ x ¡ ∞;
and
Mr ( 1 ; 1 ; m; w)¿Mr ( 2 ; 2 ; m; w):
266 O. Yoshida et al. / Statistics & Probability Letters 42 (1999) 257 – 266

Proof. Let F1 be the distribution function of a random variable with a beta distribution with parameters 1
and 1 and F2 the distribution function of a beta random variable with parameters 2 and 2 . It is known
that 1 6 2 and 1 ¿ 2 imply that F1 6st F2 (Stoyan and Daley, 1983), which proves the result.

Suppose that the distribution F is a beta distribution with parameters = A × q and = A × (1 − q);
q ∈ (0; 1) and A ¿ 0. Thus q0 (F; m) is as in (11). Let us denote by Mr (A; m; w) the rth moment of the
posterior distribution. Next we see that posterior estimates of k decrease with the strength A of this prior on
the capture probabilities.

Proposition 4.2. Let  be any proper distribution for k, or let  be the improper uniform distribution on
N∗ . Then A1 6A2 implies that
P[k ¿ x|A1 ; m; w]¿P[k ¿ x|A2 ; m; w]; −∞ ¡ x ¡ ∞
and
Mr (A1 ; m; w)¿Mr (A2 ; m; w):

Proof. First we note that q0 (A; m) is nonincreasing in A. Thus, A1 6A2 implies that q0 (A1 ; m)=q0 (A2 ; m)¿1
and consequently that the function h(j) = {q0 (A1 ; m)=q0 (A2 ; m)} j ; j ∈ N∗ , is nonincreasing. We deÿne g(j) =
I(x;∞) ( j); j ∈ N∗ , and follow the steps (12) – (15) as in the proof of Proposition 4.1(i).

5. Concluding remarks

We have proposed a Bayesian formulation of the capture–recapture model for the estimation of population
size k. This model is a particular model for the estimation of the number of species in a population as seen
in the review of Bunge and Fitzpatrick (1993). The Bayesian formulation requires a prior distribution F for
the capture probabilities and a prior distribution  for k. We discuss how the data and the prior F a ect
the Bayesian estimation of k by proving monotonicity properties of the posterior distribution of k. If a prior
distribution F is not available we would recommend a hierarchical formulation for the capture probabilities
and for this case similar properties can be proved.

Acknowledgements

The authors thank the referees for helpful comments and suggestions which substantially improved the
presentation. Partial ÿnancial support from CNPq-Brasil is also acknowledged.

References

Bunge, J., Fitzpatrick, M., 1993. Estimating the number of species: a review. J. Amer. Statist. Assoc. 88, 364–373.
Burnham, K.P., Overton, W.S., 1978. Estimation of the size of a closed population when capture probabilities vary among animals.
Biometrika 65, 625–633.
Burnham, K.P., Overton, W.S., 1979. Robust estimation of the size of a closed population when capture probabilities vary among animals.
Ecology 60, 927–936.
Chao, A., 1987. Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43, 783–791.
Edwards, W.R., Eberhardt, L.L., 1967. Estimating cottontail abundance from live trapping data. J. Wildl. Manag. 31, 87–96.
Lewins, W.A., Joanes, D.N., 1984. Bayesian estimation of the number of species. Biometrics 40, 323–328.
Otis, D.L., Burnham, K.P., White, G.C., Anderson, D.R., 1978. Statistical inference from capture data on closed animal populations.
Wildlife Monographs, vol. 62. The Wildlife Society, Washington, DC, USA.
Stoyan, D., Daley, J., 1983. Comparison Methods for Queues and other Stochastic Models. Wiley, New York.

You might also like