The Use of The Likelihood Function in The Analysis of Environmental Data

THE USE OF THE LIKELIHOOD FUNCTION IN THE
ANALYSIS OF ENVIRONMENTAL DATA

ELOI
SA DI
AZ-FRANCE
S* AND DAVID A. SPROTT

CIMAT, A.P. 402, Guanajuato, Gto. 36000, Mexico
SUMMARY
Inferential procedures that take into account the entire course of the likelihood function are considered for
the analysis of environmental data. The purpose is to obtain quantitative statements of plausibility about
the unknown parameters of the model that use all of the parametric information contained in the sample.
With due attention paid to the shape of the likelihood function, the methods are applicable to small
samples. There are no asymptotic requirements. Skewed densities f(x; y) as well as censoring can be easily
taken into account. The methods are exemplied with a water pollution censored data set. Copyright
# 2000 John Wiley & Sons, Ltd.
KEY WORDS censoring; detection limit; likelihood-condence intervals; linear pivotal; prole likelihood;
skewness; small samples; water pollution
1. INTRODUCTION
Measurements on lifetime data x
/
= (x
1
, . . . , x
n
) as well as on environmental data, such as water
pollution samples, may present features that lead to very interesting inferential problems.
Examples of these are the existence of detection limits which induce censoring, the presence of
skewed densities f(x; y) and likelihood functions L(y, x) o f(x; y), and the widely varying con-
centrations of pollutants or positive variables spanning several orders of magnitude. The goal is
to obtain quantitative statements of plausibility about the unknown parameters, y, or 11
functions thereof, of the relevant model f that use all of the parametric information contained in
the sample.
To accommodate these features requires inferential methods that make use of the entire
likelihood function. This may be justied in many ways. The likelihood function L ranks the
evidence in favour of values of y in terms of how probable these values make the observed X = x.
The likelihood therefore supplies a natural order of preference among the possibilities under
consideration (Fisher 1973, p. 73; Edwards 1992, chaps. 13). Also, the likelihood function is
always minimal sucient, so that for problems of estimation considered here we need only aim to
specify this function (Barnard 1962). It is also important that the likelihood function exists
whenever there is a probability model in terms of a well dened parameter, irrespective of the
sample size and the complexity of the model. In particular, the presence of censoring does not
impede the use of the likelihood function.
The foregoing is in sharp contrast with the customary use of the maximum likelihood estimator
alone or other point estimators with optimal, usually only asymptotic, properties such as
CCC 11804009/2000/01007523$17
.
50 Received 10 April 1999
Copyright
#
2000 John Wiley & Sons, Ltd. Accepted 29 May 1999
ENVIRONMETRICS
Environmetrics 2000; 11: 7597
* Correspondence to: E. D az-France s, CIMAT, A.P. 402, Guanajuato, Gto. 36000, Mexico.
unbiasedness and minimum variance. These methods centre attention in a neighbourhood
around the maximum of the likelihood function and do not necessarily take into account its
shape. In nite samples an estimate and its variance are rarely sucient to specify or reproduce
the entire likelihood function.
Usually, one of the main objectives when analyzing lifetime or environmental data is the
estimation of the quantiles of the relevant distribution, since the probability of exceeding a
quality standard given by environmental protection agencies is of particular interest. In order to
full these objectives, a model that represents the data adequately is essential, to which the
inferential procedures described here can be applied.
The purpose here is to exhibit methods which exploit the full likelihood function to produce
quantitative statements of plausibility about the unknown parameters of the model that use all of
the parametric information contained in the sample. These methods will be exemplied with a
water pollution censored data set.
2. PROPERTIES OF THE RELATIVE LIKELIHOOD FUNCTION
The relative likelihood function is a standardized version of the likelihood function and is
dened as
R(yY x) =
L(yY x)
L(
yY x)
Y
where

y is the maximum likelihood estimate (MLE) of y, so that 0 4R(yY x) 41. Denote the log
likelihood by l(y, x) = ln[L(y; x)]. For a single parameter y, the observed information is the
negative of the second derivative of l(y; x) evaluated at

y.
I
y
=
d
2
l(yY x)
dy
2
_ _
y=

y
X
For a unimodal and single parameter likelihood function, a likelihood interval of 100c% level
is obtained by cutting horizontally the graph of R(y; x) at a height of c, that is, the two roots of
R(y; x) = c, (see Kalbeisch 1985, p. 48). In order to ascribe a coverage probability to this
interval, a linear pivotal quantity u
y
, ecient in the sense that its density function reproduces
R(y; x), is required (Sprott 1990; Viveros and Sprott 1987). A candidate for such a quantity is
u
y
= (
y y)
y
_
X
If the likelihood intervals are also condence intervals, they may be called likelihood-condence
intervals (Viveros and Sprott 1987). The likelihood function will then be adequately reproduced
by a nested set of a few of these likelihood-condence intervals at various levels of condence,
such as 99%, 95%, 90%, supplemented by the MLE. These then constitute quantitative
statements of plausibility about y that use all of the parametric information in the data.
Usually y will be a vector. The problem of separate estimation consists in making inferences
about subsets of the elements of y. For example, the location parameter of a location-scale
model, or more generally the quantiles, may be of interest in the absence of knowledge of scale or
other remaining parameters. This is not trivial, and may in some cases be impossible (see
Edwards 1992, p. 109) since likelihoods, unlike probabilities, are not additive. Special techniques,
such as the prole likelihood, have been developed to address this problem.
Copyright # 2000 John Wiley & Sons, Ltd. Environmetrics 2000; 11: 7597
76 E. DI
AZ-FRANCE
S AND D. A. SPROTT
Let y = (y
1
, y
2
, . . . , y
k
). The separate estimation of a parameter of interest y
1
in the presence of
nuisance parameters {y
2
, . . . , y
k
}, may be addressed by the prole or maximized likelihood
(Sprott and Kalbeisch 1969; Barndor-Nielsen and Cox 1994, pp. 8991) which is dened as
R
p
(y
1
Y x) = max
y [ y
1
R(yY x) = R[y
1
Y

y
1
Y F F F Y

y
k
(y
1
)Y x]Y (1)
the maximum being over all {y
2
, . . . ,y
k
} that are consistent with the given value of y
1
. That is, for
each value of y
1
, the full relative likelihood surface is maximized over the remaining parameters
given that xed value of y
1
. These maximized values of the remaining parameters are usually
referred to as the restricted MLE; that is restricted to y
1
being specied. These restricted MLE of
the nuisance parameters

y
i
(y
1
)Y i = 2Y F F F Y k, therefore depend on y
1
and are the values of y
i
,
i = 2, . . . k, which make the observed sample most probable, given the xed value of y
1
.
The prole observed information of a one dimensional component y
i
in the absence of
knowledge of the remaining parameters, is
I
y
i
=
1
I
ii
Y
where I
ij
is the ijth entry of the inverse I
1
y
of the observed information matrix I
y
,
I
y
=
d
2
l(yY x)
dy
j
dy
i
_ _
y=

y
Y
(Sprott 1980). This is also valid in the single parameter case.
An essential feature of likelihood is that, like probability, but unlike probability density, it has
the property of functional invariance, whereby the likelihood of any 11 function f(y) can be
obtained directly from the likelihood of y by direct algebraic substitution. For instance, likeli-
hood inferences about y can be transformed into corresponding inferences about f = exp(y)
merely by replacing y by log f. This is important because it opens the possibility of changing the
shape of the likelihood function by changing the parameter; the likelihood of a positive para-
meter f40 can be highly asymmetric, yet the likelihood of log f can be symmetric, thus
simplifying the inferences. More importantly, the parameter of scientic interest often will not be
y or f, but some other 11 function of y. From the likelihood perspective this introduces no
diculties.
These properties and uses of the likelihood function in supplying quantitative inferences about
various parameters of interest will be illustrated on the following data.
3. DESCRIPTION OF THE CHROMIUM DATA
Water samples collected from discharge produced by tanneries in Leo n, an industrial city of
Mexico, will serve to exemplify the application of the proposed likelihood methods. A sample of
size 593 was gathered and concentrations of 14 chemical variables were measured. The samples
were collected within a 13 month period at dierent sites (tanneries) in 1991 (Negrete 1992). The
industrial processes of tanneries are similar throughout the year. Our goal is to model the
distribution of chromium concentrations overall, so time was not considered in the analysis. Of
the 14 variables, two which were censored due to a lower detection limit, chromium and suldes,
will be considered here. The chromium data X, measured in mg/l, had 79 (13%) of the
LIKELIHOOD FUNCTION 77
observations censored due to a lower detection limit T = 0
.
005 mg/l. This lower detection limit is
given by the inability of measuring equipment to detect smaller values than T. That is, censored
observations in the interval [0, T], were all registered as T.
Censored data frequently appear in environmental data, giving rise to the problem of how to
combine observed and censored data in order to model them adequately. One approach is to
assume that all observations come from the same distribution, on a suitable transformed scale if
necessary (see, for example, Shumway et al. 1989; Stoline 1991; Nakamura and D az-France s
1994). Another approach is to assume that censored data behave dierently from observed data.
This is a reasonable assumption when a scale is found for which censored data are distant from
observed data. In this case, it is reasonable to assume that each group follows a dierent
distribution; that is, each group belongs to a dierent population. The fact that for chromium
data uncensored observations may be of several orders of magnitude larger than censored data
suggests this approach.
Figure 1 shows a histogram of chromium data, both censored and uncensored, where the
extreme skewness of the data is evident as well as the fact that chromium concentrations span
several orders of magnitude. The lower detection limit is T = 0
.
005 mg/l and diers greatly from
the maximum sample value of X = 2313
.
97 mg/l. Lambert et al. (1991) mention the convenience
of taking logarithms of chemical concentrations data since they typically span several orders of
magnitude. Figure 2 shows a histogram of the corresponding log data. The logarithmic scale
clearly shows a separation in chromium data between the uncensored and the censored data,
implying that observed data are several orders of magnitude larger than the censored data.
Therefore, the observed data may be associated with highly polluted water samples, whereas the
censored data indicate water samples with non-detectable and thus negligible levels of this
Figure 1. Complete chromium data
78 E. DI
AZ-FRANCE
S AND D. A. SPROTT
contaminant. Censored observations in the log scale could hardly be thought to form part of the
left tail of a unimodal distribution that tted well to the uncensored data. Such a distribution
would assign a negligible probability to the interval ( o , ln(T)] due to the large separation
between the detection limit and the uncensored data. This would contradict the fact that 13% of
the observations were actually censored. The magnitude of the separation between the censored
and the uncensored data is very large the log detection limit is ln(T) = 529 and this is far
from the smallest uncensored observation in the log scale which is 0
.
215.
The chromium data set thus depicts a typical example of environmental data which are non-
negative, where concentrations vary widely in more than one order of magnitude and whose
distribution is highly skewed.
4. THE MODEL
Let Y = ln(X) be the chromium log data. The resulting skewness of the uncensored data
(Figure 2) suggests the use of a member of a unimodal skewed location-scale model such as a
log F(a, b) distribution, with density
g(uY aY b) = e
aua2
1
a
b
e
u
_ _
(ab)a2
G
a b
2
_ _
G
a
2
_ _
G
b
2
_ _
a
b
_ _
aa2
Y (2)
where u = (y y)as, and where (y, s) are the location and scale parameters, respectively.
Figure 2. Complete chromium log data
The corresponding density function of log data Y is
h(yY aY b) =
1
s
g
y y
s
_ _
X (3)
An additional parameter to be considered for the whole data set is the censoring proportion p,
the probability of having a censored observation. Therefore, the likelihood function of (y, s, p)
based on the chromium log data set is
L( pY yY sY y) =
593
79
_ _
p
79
(1 p)
514
_ _

514
i=1
f(y
i
Y yY s [ y
i
4T*)
_ _
CENSORED UNCENSORED
Y (4)
where T is the common lower detection limit of the data in the original scale and T* = ln(T).
A member of the log F(a, b) family of models can be selected for f(y
i
; y, s) in (4). The log F
family of models includes the extreme value or Gumbel distribution, when a = 2, b = o . The
Gumbel or extreme value model captures the relevant features of the chromium uncensored log
data adequately. The corresponding density is
f(u) = exp(u exp(u))Y (5)
where
u =
y y
s
X
The likelihood for the uncensored log data is obtained by substituting the Gumbel density in
the second factor of (4)
L(yY sY y) o
514
i=1
1
s
f(y
i
Y yY s)
o
1
s
514
exp
514
i=1
y
i
y
s
_ _
514
i=1
exp
y
i
y
s
_ _
_ _
X (6)
Notice that when there is a common lower detection limit T in the original scale, the model for
the uncensored log data f(y
i
; y, s) is truncated below by T* = ln(T). However, for the chromium
log data set, the censored observations are so distant from the uncensored data that the
untruncated density f(y
i
; y, s) is approximately the same as the truncated density. For the
Gumbel model, for example.
1
s
_
o
T*
f(yY

yY s) dy = 09999Y
where T* = ln(0
.
005 mg/l) and (
yY s) are the MLE of y and s, respectively. Therefore, the non-

truncated Gumbel distribution will be used here for f(y
i
; y, s). To obtain inferences about y or s
in terms of likelihood-condence intervals, in the absence of knowledge of the other parameter,
the corresponding prole likelihood of each parameter will be used. Gauss VM 3.2.6 (of Aptech
80 E. DI
AZ-FRANCE
S AND D. A. SPROTT
Systems Inc.) programming language was used for all calculations in this work on a Pentium
personal computer. Likelihood maximizations were performed using the downhill simplex
method of Nelder and Mead (1965) to minimize the negative of the log likelihood function.
5. LIKELIHOOD ANALYSIS OF THE CHROMIUM DATA
5.1. Inferences about p
The parameter p may be estimated by maximizing the binomial likelihood of p, the rst factor
in (4),
L( pY y) o p
79
(1 p)
514
X
The maximum likelihood estimate is p = 79a593 = 0133. Figure 3 shows the relative likelihood
function of p.
The inferences about the remaining parameters in the model, y and s, will be made conditional
on the uncensored log data. The large separation in the log scale in Figure 2 suggests there is a
fundamental distinction between the censored and uncensored data. For example, the censored
data may represent essentially uncontaminated samples and the censored data heavily contamin-
ated samples. Therefore, in order to quantify levels of contamination, it might be appropriate to
condition on the uncensored data. Thus, the following sections up to Section 5.7 will consider
Figure 3. Relative likelihood of p
only uncensored data; however, Section 5.8 presents a brief discussion on how to combine
censored and uncensored data to produce inferences based on the whole data set.
5.2. Inferences about the location parameter yyyyy conditional on the uncensored data
In order to make inference about the location parameter y, its prole likelihood will be obtained
from (6). The prole log likelihood of y (eliminating s) is
l
p
(yY y) = ln [L
p
(yY y)] = max
s [ y
l (yY sY y)
=
514
i=1
y
i
y
s(y)
_ _

514
i=1
exp
y
i
y
^ s(y)
_ _
514 ln ^ s(y)X
(7)
This function is obtained by maximizing (6) over s, for each xed value of y. There is no closed
expression for this function, but it can be obtained numerically for all values of y. Figure 4 shows
the relative prole likelihood of y. It is clearly symmetric, and can be fairly well approximated by
a normal likelihood. The linear approximate pivotal quantity for making inferences about y is
u
y
= (
y y)

I
y
_
, and the inferences about y can be expressed in the following way,
y =

y +
1
y
_ u
_ _
= [y = 59293 +004352u]Y u N(0Y 1)X (8)
Figure 4. Relative prole likelihood of y
82 E. DI
AZ-FRANCE
S AND D. A. SPROTT
A nested set of three likelihood-condence intervals obtained for three specic values of u is
shown in Table I,
These intervals are marked in Figure 4. Notice how their endpoints outline the full prole
likelihood function of y.
5.3. Inferences about the quantiles conditional on the uncensored data
The quantiles may be considered also as location parameters. In location-scale models with
density f(y; y, s), there is a direct relationship between the location and scale parameters and the
quantile Q
Y,a
with probability a. The a quantile Q
Y,a
is dened by
P[Y4Q
YYa
] = a = F
Y
Q
YYa
y
s
_ _
Y
where F
Y
(.) is the distribution function of U = (Y y)as. If F
1
Y
(.) is its inverse, one may solve
above for Q
Y,a
. Let k
YYa
= F
1
Y
(a), then
Q
YYa
= y k
YYa
sX (9)
The parameter y is itself a quantile, since it is the special case when k
Y,a
= 0.
The likelihood may then be reparametrized in terms of Q
Y,a
and s, instead of (y, s). A one to
one transformation from (y, s) or (Q
Y,a
, s
/
) is made, where s
/
= s. The inferential procedure
remains the same, but now in terms of the quantile Q
Y,a
. The quantity Q
Y,a
k
Y,a
s is substituted
for y in (6). In the case of the Gumbel distribution, a closed expression for k
Y,a
may be obtained in
terms of the probability a,
k
YYa
= ln{ln(1 a)]X
Thus, the prole log likelihood of the quantile Q
Y,a
for the Gumbel model is
l
p
(Q
YYa
) =
514
i=1
y
i
Q
YYa
s(Q
YYa
)
k
YYa
_ _
514
i=1
exp
y
i
Q
YYa
s(Q
YYa
)
k
YYa
_ _
514 ln{ s(Q
YYa
)]X (10)
The observed information matrix I

Q
YYa
Y s
can be easily obtained from I
yY s
(see Kalbeisch 1985,
Section 10.4).
It is important that by functional invariance of likelihoods, inferences about the quantiles Q
X,a
can be obtained from those about Q
Y,a
by direct algebraic substitution. For example, in the
present case interest resides in the original scale X = exp(Y) mg/l. The quantiles for X are
therefore Q
X,a
= exp(Q
Y,a
) mg/l.
Table I. Likelihood-condence intervals for y
u Condence level (%) Likelihood level (%) Intervals for y
1
.
64 90 25
.
8 (5
.
858, 6
.
001)
1
.
96 95 14
.
7 (5
.
844, 6
.
014)
2
.
57 99 3
.
7 (5
.
817, 6
.
041)
5.4. Inferences about Q
Y,05
, the median, and Q
Y,095
, conditional on the uncensored data
For the Gumbel model k
YY05
= F
1
Y
(05) = ln [ln(2)] = 0366. The prole log likelihood of the
median may be obtained by substituting these values of k
Y,05
in (10). Figure 5 shows the prole
likelihood of the median Q
Y,05
, as well as its corresponding normal approximation. The MLE of
the median can be obtained by substituting

y, s, and k
Y,05
in (9) and is

Q
YY05
= 5585. The
corresponding observed information, calculated as described above, is I

Q
YY05
= 4252.
Once again, the normal approximation describes well the prole likelihood, and the inferential
statements about Q
Y,05
may be given in the following simple way:
Q
YY05
=

Q
YY05
+
1
I

Q
YY05
_ uY u N(0Y 1)X
Thus, the nested set of approximate likelihood-condence intervals of the median are
Q
YY05
= 5585 +0048 u = Q
XY05
= 2665 exp( +0048u)mgal u N(0Y 1)X (11)
Table II exhibits specic nested sets of likelihood-condence intervals for Q
Y,05
and Q
X,05
.
The same procedure is applicable to other quantiles. For instance, the corresponding results
for the 0
.
95 quantile are
Q
YY095
= 6959 +0047u = Q
XY095
= 10529 exp( +0047u) mgal u N(0Y 1)X (12)
Figure 5. Relative prole likelihood of Q
Y,05
84 E. DI
AZ-FRANCE
S AND D. A. SPROTT
Notice that the normal approximations are applied to the symmetric likelihoods arising from
the log data Y. The resulting inferences can be transformed back to the observed scale of mg/l.
The intervals obtained in this scale (of mg/l) are no longer symmetric about the MLE
(

Q
XY05
= 2665 and

Q
XY095
= 10529). Thus this procedure accounts for the asymmetry of the
likelihoods arising from the data X. Table III exhibits specic nested sets of likelihood-
condence intervals for Q
Y,095
and Q
X,095
.
Figures 6 and 7 show the relative prole likelihood of Q
Y,095
and Q
X,095
. Figure 6 shows how
the endpoints of the likelihood-condence intervals of Table III outline the corresponding
likelihood function of Q
Y,095
. Note that the likelihood of Q
X,095
shown in Figure 7 is asymmetric
and that the normal approximation (Q
XY095
= 10529 +4939u mgal, where I

Q
XY095
= 00004)
does not reproduce adequately the asymmetric observed likelihood function of Q
X,095
due to the
asymmetry of this function. This asymmetry should be taken into account in estimating Q
X,095
,
or the resulting inferences will be very misleading. They will in particular result in emphasizing
small values of Q
X,095
which are very implausible and will ignore large values of Q
X,095
which are
highly plausible, thus undertaking the magnitude of Q
X,095
. However, the endpoints of the
likelihood-condence intervals obtained for Q
Y,095
can be transformed to the X scale and provide
likelihood-condence intervals for Q
X,095
that account for this asymmetry (note that these
intervals are not symmetric about

Q
X,095
).
5.5. Robustness in estimation
An important question is whether the parameter estimation presented here is sensitive to slight
changes in the model. This section addresses this issue.
A member of the log F(a, b) family was selected to model the uncensored data. The selected
Gumbel model is the member with a = 2, b = o and was found to adequately describe the
data. However, there is a range of log F models that t the data reasonably well in addition to the
Gumbel model. For instance, the log F family member with a = 0
.
79, b = 4
.
12, the log F(1, 4)
model or the log F(1, 3). For simplicity, the Gumbel model was used here.
Since these log F models are not scaled to have a mean y and standard deviation s, it is not
surprising that the inferences about the parameters should depend considerably upon the values
of a and b. This is certainly true for s, where the inferences depend crucially on the values of a
Table II. Likelihood-condence intervals for Q
Y,05
and Q
X,05
u Condence level (%) Likelihood level (%) Intervals for Q
Y,05
Intervals for Q
X,05
(mg/l)
1
.
64 90 25
.
8 (5
.
506, 5
.
665) (246
.
100, 288
.
525)
1
.
96 95 14
.
7 (5
.
490, 5
.
680) (242
.
310, 293
.
037)
2
.
57 99 3
.
7 (5
.
461, 5
.
710) (235
.
248, 301
.
835)
Table III. Likelihood-condence intervals for Q
Y,095
and Q
X,095
u Condence level (%) Likelihood level (%) Intervals for Q
Y,095
Intervals for Q
X,095
(mg/l)
1
.
64 90 25
.
8 (6
.
882, 7
.
036) (974
.
905, 1137
.
127)
1
.
96 95 14
.
7 (6
.
867, 7
.
051) (960
.
374, 1154
.
332)
2
.
57 99 3
.
7 (6
.
839, 7
.
080) (933
.
273, 1187
.
853)
and b. Figure 8 exhibits the sharp contrast between the relative prole likelihood of s based on
the Gumbel model with the corresponding likelihood based on the log F(0
.
79, 4
.
12) model. The
MLE of s for the log F(0
.
79, 4
.
12) model is s
LF
= 0447, and for the Gumbel model is
s
G
= 0939.
However, the same is not true for the estimation of the location parameters y ks, the
quantiles. This might be thought surprising, particularly since the quantiles are linear functions
of s. For the above models, the inferences about the location parameter y and the quantiles
depend very little on the values of a and b, as illustrated in Figure 5. For instance, the MLE of y
for the log F(0
.
79, 4
.
12) model is

y
LF
= 5999, and for the Gumbel model is

y
G
= 5929. Figure 5
compares the relative prole likelihood of the median under these two models. Note how both
curves overlap considerably, in contrast with the separation exhibited in Figure 8 for the relative
prole likelihoods of s.
Thus, the estimation of the location parameter and the quantiles is robust under these models,
in the sense that it does not matter which model is used. This implies that the estimation of the
product k
Y,a
s is robust, while the estimation of s and k
Y,a
individually is not, as shown in
Table IV for several probability values a and for the Gumbel and log F(0
.
79, 4
.
12) models. Bear
in mind that when a and b change, s and k
Y,a
both change while their product remains relatively
constant, so that the model is compensating through k
Y,a
the dierence in s.
Under these two models, inference was performed for quantiles such as Q
Y,001
, Q
Y,005
, Q
Y,010
,
Q
Y,025
, Q
Y,075
, and Q
Y,099
. In all these cases, the normal likelihoods induced by u
o
=
Y,095
86 E. DI
AZ-FRANCE
S AND D. A. SPROTT
( o o)
I
o
_
(where o stands for Q
Y,001
, Q
Y,005
, Q
Y,010
, Q
Y,025
, Q
Y,075
, and Q
Y,099
) reproduced
well the corresponding observed prole likelihoods for both models. Figure 6 shows this for
Q
Y,095
.
The estimates are very similar for the log F(0
.
79, 4
.
12) and the Gumbel models in almost all
cases. This is shown in Table V. The exceptions are quantiles with probability smaller than 0
.
055.
The empirical quantiles of the uncensored log data are compared with the maximum likelihood
estimates of these quantiles Q
Y,a
under the two considered models, in Table V. It is clear that
estimated values are very close to the empirical ones, except for small quantiles such as Q
Y,001
and Q
Y,005
.
X,095
Table IV. Adaptive robustness of k
Y,a
s under the log F and Gumbel models
a
Log F(0
.
79, 4
.
12)
k
Y,a
s
LF
= 0
.
447
k
Y,a
s
Gumbel
k
Y,a
s
G
= 0
.
939
k
Y,a
s
0
.
01 10
.
769 4
.
809 4
.
600 4
.
318
0
.
05 6
.
737 3
.
009 2
.
970 2
.
788
0
.
10 4
.
998 2
.
232 2
.
250 2
.
112
0
.
50 0
.
830 0
.
371 0
.
366 0
.
344
0
.
95 2
.
055 0
.
918 1
.
097 1
.
030
0
.
99 3
.
075 1
.
373 1
.
527 1
.
434
This is further illustrated in Figure 9, which shows a QQ plot where the ordered observations
in the vertical axis are compared to the estimated quantiles under the Gumbel model, in the
horizontal axis. Notice how Q
Y,001
is the point which deviates most from its empirical counter-
part. As mentioned, the prole likelihood of this quantile is symmetric under both log F and
Gumbel models. Using normal approximations, the likelihood-condence intervals of 99%
condence level for this quantile is [1
.
18, 2
.
04] for the Gumbel model, and [0
.
68, 1
.
70] for the
selected log F model. The Gumbel interval does not include the empirical Q
Y,001
which is 0
.
73,
whereas the log F interval does. However, both intervals overlap and have a common region
Figure 8. Relative prole likelihood of s
Table V. Empirical quantiles of uncensored log data versus estimated quantiles of
the log F and Gumbel models
a Empirical Q Log F

Q
YYa
Gumbel

Q
YYa
0
.
01 0
.
73 1
.
18 1
.
61
0
.
05 2
.
75 2
.
99 3
.
14
0
.
10 3
.
88 3
.
77 3
.
82
0
.
25 4
.
83 4
.
80 4
.
76
0
.
50 5
.
60 5
.
63 5
.
58
0
.
75 6
.
25 6
.
23 6
.
23
0
.
95 6
.
88 6
.
92 6
.
96
0
.
99 7
.
47 7
.
37 7
.
36
88 E. DI
AZ-FRANCE
S AND D. A. SPROTT
[1
.
18, 1
.
70], where the true value of Q
Y,001
could be. That is, neither model is contradictory to
observed data, even for small quantiles, and both seem to model adequately all remaining larger
quantiles.
5.6. Inferences about the probability of exceeding specied values conditional on the
uncensored data
The probability t(Z) of exceeding a quality standard or other xed quantity Z (in logarithms of
mg/l) can be another parameter of interest. For the chromium log data, it is given by
t(Z) = Pr[Y5Z [ Y5T*] = 1 F
Y
Z y
s
_ _
Y (13)
where T* = ln(T mgal) = 529, and F
Y
(.) is the distribution function of U = (Y y)as.
For the Gumbel model.
t(Z) = exp exp
Z y
s
_ _ _ _
X (14)
Figure 9. QQ Plot, log chromium data, Gumbel model
A reparametrization in terms of (t, s) is achieved by making a one to one transformation from
(y, s) to (t, s
/
) considering that
y = Z s ln{ln(t)]Y (15)
from the denition (13), and that s = s
/
. The prole likelihood of t(Z) may be obtained by
reparametrizing (7) in terms of t(Z), using (15).
For chromium, the quality standard given by the technical ecological norm NTE-CCA-021/88
of the Mexican legislation is Z = ln(6 mg/l). The maximum likelihood estimate of t(Z) is given by
substituting Z = ln(6) and the MLE of y and s in (14) and is t = 0988. The observed
information of t is I
t
= 233Y 60071. Figure 10 exhibits the prole likelihood of t(ln 6) for
chromium log data. This likelihood function is asymmetric, and therefore, the corresponding
normal approximation is not appropriate to construct likelihood-condence intervals for t(ln 6).
A suitable reparametrization can sometimes help to symmetrize the likelihood function. A
symmetrized likelihood is more likely to admit a normal approximation and then inferences
about the parameter of interest can be expressed in a very simple form, such as (8), (11), or (12).
This suggests the use of the log odds ratio transformation for the present case,
b(Z) = ln
t(Z)
1 t(Z)
_ _
Y or t(Z) =
e
b(Z)
1 e
b(Z)
Y (16)
Figure 10. Relative prole likelihood of t
90 E. DI
AZ-FRANCE
S AND D. A. SPROTT
in order to symmetrize the corresponding prole likelihood function. The advantage of this
reparametrization is that the range of b(Z) is now ( o , o ), which is more likely to admit a
normal approximation in comparison with a parameter like t, which is restricted to be in [0, 1]. In
cases when it is not possible to nd a reparametrization that symmetrizes the likelihood function,
the procedures in Viveros and Sprott (1987) might be helpful.
The maximum likelihood estimate of b(ln 6) is obtained by substituting the maximum
likelihood estimate of t(ln 6) in (16) and is

b = 4401. The observed information of b(ln 6) is
I
b
= I
t
dt
db
_ _
2
= 33458X
The prole likelihood of b(ln 6) is indeed symmetric for the chromium log data, and is described
well by the corresponding normal approximation, as shown in Figure 11. That is, the normal
likelihood induced by the quantity
u
b
= (
b b)
b
_
= (4401 b)
33458
_
Y
approximates well the observed likelihood.
Using the invariance property of the likelihood function, the nested set of likelihood-
condence intervals given for b(Z), may be transformed back to the original scale of t(Z), by
Figure 11. Relative prole likelihood of b
transforming back their end points. The inferential statements about these parameters may be
thus given in the following way for Z = ln 6.
For b(ln 6) X b = 4401 +0173uY
for t(ln 6) X e
b1(u)
a[1 e
b1(u)
] 4t 4e
b2(u)
a[1 e
b2(u)
]Y
(17)
where b1(u) and b2(u) are the left and right end points of the likelihood condence interval of
b(Z) associated to a specic value of u, and therefore to a corresponding condence and
likelihood level. Nested sets of these intervals reproduce well both (symmetric and asymmetric)
likelihood functions of b(ln 6) and t(ln 6). Some explicit intervals for both parameters are shown
in Table VI.
The extremely high probabilities t(ln 6) indicated in Table VI clearly show that the standard
set by the Mexican government is being exceeded without doubt by most tanneries in Leo n, a fact
that is immediately obvious from Figure 1.
Note that inferences for t, given Z specied, equally provide inferences for Z, given t specied,
merely by interchanging t and Z. Since Z is the quantile Q
Y,1t
, inferences about Z can be
obtained as described in Section 5.3.
5.7 Inferences about the expected value of the concentration variable mmmmm
X
E(X) conditional on the
uncensored data
A possible parameter of interest in the analysis of pollutant concentrations is the expected
pollutant level m
X
in the original scale of mg/l. For example, the US Environmental Protection
Agency requires that risks should be characterized in terms of the mean contaminant con-
centration (see El-Shaarawi and Viveros 1997). The likelihood-based methods that are proposed
here allow inferences about this parameter to be made in a straightforward way when this
parameter is nite.
If log data Y can be modeled with a location-scale model with location and scale parameters
(y, s), respectively, then the expected value of data X = exp(Y) mg/l is
m
X
= E(X) = E[exp (Y)] =
_
o
o
e
y
1
s
g
y y
s
_ _ _ _
dy = e
y
M(s)Y (18)
where M(s) is the moment generating function of the standard location-scale distribution of Y,
evaluated at s.
Table VI. Likelihood-condence intervals for b(ln 6), and t(ln 6), for Z=ln(6 mg/l)
Condence level (%) Likelihood level (%) u Intervals for b(ln 6) Intervals for t(ln 6)
50 79
.
3 0
.
68 (4
.
283, 4
.
519) (0
.
986, 0
.
989)
90 25
.
8 1
.
64 (4
.
117, 4
.
685) (0
.
984, 0
.
991)
95 14
.
7 1
.
96 (4
.
062, 4
.
740) (0
.
983, 0
.
991)
99 3
.
7 2
.
57 (3
.
957, 4
.
845) (0
.
981, 0
.
992)
92 E. DI
AZ-FRANCE
S AND D. A. SPROTT
For the chromium data, since log data Y were modeled with a Gumbel distribution, the
following holds.
m
X
= [exp (y)G(1 s)] mgalY (19)
where
G(1 s) =
_
o
0
t
s
e
t
dtX
Using (19) one can reparametrize the problem from (y, s) to (m
X
, s) and proceed in the same
way as in the previous sections. That is, inferences about m
X
can then be obtained in a similar way
as for y, using the prole likelihood of m
X
. The MLE is m
X
= 36674 and the observed informa-
tion is I
m
X
= 0004. [To obtain I
m
X
s
, the approximation to the Digamma function shown in
Abramowitz and Stegun (1972, p. 259, formula 6.3.18) was used.]
Figure 12 shows the relative prole likelihood of m
X
for the chromium data and the corres-
ponding normal approximation. Once again, the normal approximation describes well the prole
likelihood, and the inferential statements about m
X
may be given in the following simple way:
m
X
= m +
1
I
m
X
_ u
_
_
_
_
mgalY u N(0Y 1)X
Figure 12. Relative prole likelihood of E(X)
Thus, approximate likelihood-condence intervals of the expected value on the original scale can
be easily obtained,
m
X
= [36674 +1519u] mgalY u N(0Y 1)X
A specic nested set of approximate likelihood-condence intervals for m
X
is given in
Table VII. The endpoints of these intervals outline the prole likelihood function shown in
Figure 12. These results reinforce the preceding results in suggesting that the Mexican Govern-
ment standard of 6 mg/l is being substantially exceeded.
In general, if log data Y can be modeled with a log F(a, b) location-scale model as (2), the
following holds.
m
X
= E(X) = E[exp(Y)] =
b
a
_ _
s
e
y
G
a
2
s
_ _
G
b
2
s
_ _
G
a
2
_ _
G
b
2
_ _
_
_
_
_
X (20)
5.8 Marginal inferences incorporating censored data
Inferences about the conditional quantiles Q
Y,a
based on the uncensored data in Section 5.3, can
be converted to inferences about the marginal quantiles Q
*
YYg
that incorporate the censored data.
Let Q
YYa
= Q
*
YYg
; the probabilities attached to these quantiles, which are the same quantity
numerically, are related via
g = p (1 p)a (21)
where a is the probability associated with the conditional quantile Q
Y,a
, based on the uncensored
data, and g is the marginal probability associated with the same quantity considered as the
marginal quantile Q
*
YYg
, based on all the data. That is, considering all of the data the following
holds,
Pr[Y4Q
*
YYg
= Q
YYa
] = g = p (1 p)aX (22)
This can be applied to obtaining the probability of exceeding a quality standard or other xed
quantity similarly, incorporating censored data. Let Z 4ln(T) = 529 ln(mgal), and let p(Z) be
the probability of exceeding the value Z, incorporating the censored data. Then
p(Z) = Pr[Y5Z] = 1 Pr[Y4Z]Y
Table VII. Likelihood-condence intervals for m
X
Condence level (%) Likelihood level (%) u Intervals for m
X
(mg/l)
90 25
.
8 1
.
64 (341
.
83, 391
.
65)
95 14
.
7 1
.
96 (336
.
97, 396
.
52)
99 3
.
7 2
.
57 (327
.
71, 405
.
78)
94 E. DI
AZ-FRANCE
S AND D. A. SPROTT
by (22) and (13),
p(Z) = 1 Pr[Y4Z] = 1 p (1 p)F
Y
Z y
s
_ _ _ _
= (1 p) 1 F
Y
Z y
s
_ _ _ _
= (1 p)t(Z)X (23)
When Z = ln(6 mg/l), the log quality standard for chromium data given by Mexican environ-
mental regulations, the maximum likelihood estimate of p( ln 6) is p = (1 p) t = 0856. Its
corresponding likelihood function will inherit the asymmetric shape of the likelihood function of
t( ln 6), since (23) is a linear transformation for a xed value of p. However, once again, using the
invariance property of the likelihood function, the inferential statements (17) for b may be used.
The endpoints of the approximate likelihood-condence intervals for b are rst transformed back
to the scale of t(Z), and then to the scale of p(Z). The nested set of approximate likelihood-
condence intervals for p(Z) are then
e
b1(u)
1 e
b1(u)
_ _
(1 p) 4p4
e
b2(u)
1 e
b2(u)
_ _
(1 p)Y
where b1(u) and b2(u) are the left and right end points of the likelihood condence interval of b
associated to a specic value of u in (17). Table VIII presents some likelihood-condence
intervals for p( ln 6) and shows that the probability of exceeding the quality standard for all log
chromium data is indeed very high.
6. DISCUSSION
The objective of this work is not to focus on validating extensively any model within the location-
scale family for the chromium data set which is presented. This would require the examination of
a number of similar sets of data. The goal is to show how inferential procedures based on the use
of the full likelihood function can be applied to multiparametric problems such as that of water
pollution censored samples or other environmental data and yield straightforward informative
results in terms of likelihood-condence intervals for the parameters of interest. These inferences
can be made irrespective of the model under consideration. For instance, the suldes data set
mentioned in Section 3 were analyzed in a similar way as the chromium data. In the logarithmic
scale, censored observations were also distant from the uncensored data as occurred with
chromium. A similar log F(1, 3) model described adequately the uncensored observations. This
fact suggests that it is reasonable to believe that there is a reproducible log F model behind this
type of phenomena, and that observed data are repeatable. For brevity, the specic suldes
results are not presented here.
Table VIII. Likelihood-condence intervals for p(ln 6)
Condence level (%) Likelihood level (%) u p(ln 6)
90 25
.
8 1
.
64 (0
.
853, 0
.
859)
95 14
.
7 1
.
96 (0
.
852, 0
.
859)
99 3
.
7 2
.
57 (0
.
850, 0
.
860)
There is a large literature on maximum likelihood estimation. In practice its application is
usually restricted to calculating the maximum likelihood estimate, sometimes supplemented with
the observed information matrix, the inverse of which is unfortunately interpreted as an
approximate variance, and is used to obtain estimated standard errors. No attention is paid to the
shape of the full likelihood function. This procedure has valid practical application only if the
observed likelihood function is approximately normal.
There is also a large literature on the location-scale family, and the extreme value model in
particular. Fisher (1934) developed the exact analysis of the entire location-scale family condi-
tional on the observed likelihood function. This analysis yields exact likelihood-condence
intervals (in Fisher's interpretation, likelihood-ducial intervals).
The analysis presented here approximates Fisher's exact conditional analysis in the widespread
case of an approximate normal likelihood. Likelihood can be applied whenever the probability of
the observations can be expressed in terms of a well-dened parameter. In particular, the
presence of censoring does not impede the use of likelihood. Inferences about the parameters of
interest are provided in the form of likelihood-condence intervals. They thus take into account
the shape of the full likelihood function, not just its MLE and second derivative.
Note, however, that the parameter in terms of which the likelihood is normal may not be the
parameter of interest. This is not a problem if the parameter of interest is a 11 function of this
normal parameter, since inferences based on the likelihood function can be transformed
algebraically into inferences about any 11 parametric function. This is particularly important in
Sections 5.35.5, in which the inferences about the quantiles can be expressed in the scale X of
interest via Q
Y,a
= exp(Q
Y,a
) mg/l, and in Section 5.6 for the probabilities of exceeding quality
standards. However, the same is not true for inferences about means or expectations in Section
5.7, m
X
= E(X) ,= exp[E(Y)] = exp(m
Y
). Thus approximate normality of the likelihood of m
Y
cannot similarly be used to account for any asymmetry of the likelihood function of m
X
. For this
reason perhaps quantiles should be considered as a more natural measure of concentration than
means. Fortunately, in the present case the sample is suciently large that the likelihood function
of m
X
is itself approximately normal, so that this problem does not arise. However, in cases where
it does arise (usually when sample sizes are smaller) more sophisticated approximations to allow
for residual skewness and kurtosis can often be used (Viveros and Sprott 1987). It is very
important to take into account the asymmetry of the likelihood function, if this is the case, since
otherwise the inferences could be misleading. This was the case of the likelihood of Q
X,095
, and
the consequences of ignoring its asymmetry were described in detail in Section 5.4.
In summary, the proposed inferential procedure takes account of the full course of the
likelihood function and so ensures the complete utilization of all of the sample parametric
information in skewed lifetime and environmental data.
ACKNOWLEDGEMENTS
This work was partially supported by CONACYT grants 0328P-E and I29869-E. We should like
to thank Professors Raymond Carroll and Roman Viveros for having read a preliminary version
of the paper and for fruitful comments. We also wish to thank Dolores Negrete, who collected
the chromium and suldes data sets, for giving us access to these data.
96 E. DI
AZ-FRANCE
S AND D. A. SPROTT
REFERENCES
Abramowitz, M. and Stegun, I. A. (eds.) (1972). Handbook of Mathematical Functions, with Formulas,
Graphs, and Mathematical Tables. New York: Dover Publications.
Barnard, G. A. (1962). `Contribution to the discussion of the paper Àpparent Anomalies and Irregularities
in M. L. Estimation' by C. R. Rao (1962)'. Sankhya, Series A, 24.
Barndor-Nielsen, O. and Cox, D. R. (1994). Inference and Asymptotics. London: Chapman and Hall.
Edwards, A. W. F. (1992). Likelihood, expanded edn. Baltimore: The Johns Hopkins University Press.
El-Shaarawi, A. H. and Viveros, R. (1997). Ìnference about the mean in log-regression with enviromental
applications'. Environmetrics 8, 569582.
Fisher, R. A. (1934). `Two new properties of mathematical likelihood'. Proccedings of the Royal Society of
London, Series A 144, 285307.
Fisher, R. A. (1973). Statistical Methods and Scientic Inference. New York: Hafner.
Kalbeisch, J. G. (1985). Probability and Statistical Inference. Vol. 2: Statistical Inference, 2nd edn. New
York: Springer-Verlag.
Lambert, D., Peterson, B. and Terpenning, I. (1991). `Nondetects, detection limits, and the probability of
detection'. Journal of the American Statistical Association 86(414), 266277.
Nakamura, M. and D az-France s, E. (1994). `Transformation to symmetry for censored data caused by a
detection limit'. Environmetrics 5, 399416.
Negrete Herna ndez, D. (1992). Determinacion de la calidad del agua residual proveniente de las teneras de la
ciudad de Leon, Gto. Thesis, Chemistry Faculty, University of Guanajuato.
Nelder, J. A. and Mead, R. (1965). À simplex method for function minimization'. Computer Journal 7,
308313.
Shumway, R. H., Azari, A. S. and Johnson, P. (1989). Èstimating mean concentrations under trans-
formation for environmental data with detection limits'. Technometrics 31, 347356.
Sprott, D. A. (1980). `Maximum likelihood in small samples: Estimation in the presence of nuisance
parameters'. Biometrika 47, 515523.
Sprott, D. A. (1990). Ìnferential estimation, likelihood, and linear pivotals'. The Canadian Journal of
Statistics 18, 115.
Sprott, D. A. and Kalbeisch, J. D. (1969). Èxamples of likelihoods and comparison with point estimates
and large sample approximations'. Journal of the American Statistical Association 64, 468484.
Stoline, M. R. (1991). Àn examination of the lognormal and Box and Cox family of transformations in
tting environmental data'. Environmetrics 2, 85106.
Viveros, R. and Sprott, D. A. (1987). Àllowance for skewness in maximum-likelihood estimation with
application to the location-scale model'. The Canadian Journal of Statistics 15(4), 349361.

The Use of The Likelihood Function in The Analysis of Environmental Data

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Use of The Likelihood Function in The Analysis of Environmental Data

Uploaded by

Copyright:

Available Formats

THE USE OF THE LIKELIHOOD FUNCTION IN THE

ANALYSIS OF ENVIRONMENTAL DATA

S* AND DAVID A. SPROTT

yY s) are the MLE of y and s, respectively. Therefore, the non-

You might also like