Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

THRESHOLD METHODS FOR SAMPLE EXTREMES

Richard L. Smith

Department of Mathematics, Imperial College, London SW7

The aim of this paper is to bring together two previously


unrelated areas of the statistical analysis of extremes, and to
suggest how they might form the basis of a general technique of
extreme value analysis of time series.

H The first is the class of Peaks Over Threshold methods


developed by hydrologists. These methods are based on seemingly
ad hoc statistical models, but they are flexible enough to cope
with seasonality and serial dependence.
‘Inp-

The second area is the analysis of the tail of a distribution


based on extreme order statistics. Such techniques have been
developed only for i.i.d. observations, and there remain open
questions about the sampling properties of the estimators.

These two areas of work may be regarded as contributions to


the same general problem, i.e. the modelling of the extreme
characteristics of a series in terms of its exceedances over a
high threshold level. We discuss here a general method based on
(a) modelling the exceedance times as a point process and (b) use
of the Generalised Pareto Distribution for the exceedance values.
As an example, the method is applied to an analysis of wave
heights off the coast of Britain.

1. INTRODUCTION

This paper is about methods for estimating return values and


related quantities from series of observations. For some of the
di5¢u55i0n we Shall assume that the observations are independent
621

J. Twso de Oliveira (ed.1, Statistical 5,,,,,,,,,,,, and ,4pp1i¢arions. 621-638-


@ 1984 by 0. Reidel Publishing C0mpany_
R. L. SMITH
622

and identically distributed (i.i.d.), though we are ultimately


interested in techniques which will also handle seasonality and
serial dependence which are present in many enviremehtal Series-

We shall start with a brief outline of the fggflgijfifflg _


Threshold (POT) method developed by hydr0l09i$tS- The I“e"=h°d 18
EEEEEIBEE in some detail in the English Flood Studies Repert [T3]:
subsequent developments are due to Todorovic [27,23]- Nerth [19]
and Revfeim [22,23], amongst others. The analysis is based Oh
peaks over a high threshold level. The models which have been
adopted are somewhat arbitrary, and the justifieetieh for them
is mainly empirical.

A parallel, but entirely distinct, development has been the


use of methods based on extreme order statistics. Weiss [30]
appears to have been the first to propose such a method; Pickands
[21], Hill [13] and Weissman [31-33] have been among the subsequent
contributors. These procedures have been developed from a more
rigorous mathematical standpoint than the POT procedures, but they
have been restricted entirely to the case of i.i.d. observations
and there remain some open questions even in that case.

In Section 3 we shall propose a slight modification of the


extreme order statistics method which leads naturally into an
extension of the POT method based on all exceedances (rather than
just the peaks) over the threshold. The most general model of
this form has not been developed but in Section 4 we shall present
a particular model which combines some of the main features of the
POT and extreme order statistics methods. In Section 5 we discuss
the connections with Leadbetter's theory of extremes in stationary
sequences. Finally in Section 6 we present an example of these
techniques and propose some areas for further investigation.

2. THE PEAKS OVER THRESHOLD METHOD


The analysis is based on the sequence of peaks sf a series
over_ a specified
1 h_high threshold
_ level - A £L___lS
eak ' d efined as the
maximum va ue ac ieved during any sequence of consecutive Obser_
vations abov e th e threshold. The times
- at which peaks occur are
Called.E§Q§_EiE§§: and the ¢°rIe5P0hdih9 excesses over the threshold
are called peak values. The data may be summarised as
{(12 IYi ) I 1:11 2| u 0 0 T, Y.

i'th peak value. Th 1 ' - 1


to be that of [19]: e most f exible model 1n the literature
' appears

1. The peak times {TQ"are taken to be a nonhomogeneous Poisson


process with intensity functicn
r
A p ( t ) = hp exP[m§1 Bpm sin(m0mt
- + 8pm)]
THRESHOLD METHODS FOR SAMPLE EXTREMES 523

where mo is the fundamental frequency (corresponding to a period


of one year in most environmental applications) and r the number
of significant cycles.

2. Conditionally on {Ti} , the peak values {Y_} are independent


and exponentially distributed 1

P { Yj > y I {Ti} } = exp [-Actwj) y].


where
r
)(C(t) = Ac exp [ m-E1 Bcm sin (womt + 3Cm)].

The Poisson nature of the peak times and the conditional indepen-
dence of peak values are believed to be valid assumptions provided
the threshold is sufficiently high; this is supported by empirical
evidence and by asymptotic arguments as in [27]. Note, however,
that it is essential to restrict attention to peaks for this
assumption to be reasonable. It would not be appropriate to
analyse the full sequence of exceedances in this manner because of
the clustering of high values that arises in practice. The
assumption that peak values are exponentially distributed appears
to be entirely ad hoc. The form of the functions A and X is, of
course, intended to allow for seasonal variation. P C

In subsequent sections we shall focus on two aspects of the


model which seem capable of improvement. The first is the
arbitrary assumption of exponential peak values. It will.tmaar 9 ued r
following Pickands [21], that the Generalised Pareto Distribution
is a more natural assumption. The exponential distribution is a
special case of this. The second aspect is the restriction to
peak times. It seems better to consider the point process of all
exceedances over the threshold, rather than just the peaks. To
analyse such a process, however, we shall need to employ models
which allow for clustering.

3. METHODS BASED ON EXTREME ORDER STATISTICS

Let X1,...,X denote i.i.d. observations with unknown distri-


bution function F? Inference about the upper (or lower) tail of
F may be based on the largest (or smallest) r order statistics,
for some value of r. Several authors have treated procedures of
th'18 form, starting
' ~
with Weiss [30], Pickands [21] and Hill [13]-
A good exposition of the methods was given by Weissman [31]-
We’issman treated r as a fixed constant, and showed how to construct
statsitical procedures based on the asymptotic distribution of the
r l argest order statistics. This distribution, when it exists I is
d -
etermined by the asymptotic distribution of the sample maximum.
624 R. L. SMITH

The case of two unknown parameters was dealt with in [31]; more
complex three-parameter problems have been considered in [33],
[12] and [25]. In the three-parameter case the classical asymptotic
results of maximum likelihood theory (es r'*‘”] fail? a Substantial
contribution to this problem was made by Hall [12] but many ques-
tions remain unanswered. we nbw propose eh alternative ePPI0eCh
to these procedures. Let some threshold level, which we denote
by u, be fixed. The procedure is based on all exceedances of u;
the number r of such exceedances is random and has, obviously, a
binomial distribution with parameters n and 1-F(u). Such a
procedure has been considered by Weissman [32]. Our Pr°P°$al is
that subsequent inference should be based on the conditional dis-
tribution of the exceedance values given r. The theoretical
justification for this is that r is an ancillary statistic for the
distribution of the exceedance values, i.e. the distribution of
r depends only on F(u) and not otherwise on {F(x), x3>u}. There
is also a practical advantage in that, conditionally on r, the
exceedance values are independent.

The conditional distribution of X—u given X>u is given by the


distribution function

__ F(u+y) - F(u)_ _
Q //\ “<1
Fe‘-Y) ' 1 - F(u) g *0 ‘""
where xoe m» is the upper boundary point of the distribution.

At this point we introduce the Generalised Pareto Distribution


(henceforth referred to as GPD) given by the distribution function
1 k
G(y [ 0,k) = 1 - (1 - ky/U) / , k # 0,

1 — exp(-y/U), k = Q,

where O>O and -¢<k<¢u in the case k>0 the range of y is oéyéo/k,
otherwise the range is 0éy<en The importance of the GPD is given
by Pickands‘ [ 21] result: for fixed k, we have

lim inf sup [Fu(y) - G(y I o,k) |= 0


u-rxo O<O'<oo OQYQQO

if and only if there exists an>0 and bn such that


-1 k
Fn(any + bn)-> exp[~(]'kY) / 1, k # 0,
“Y
exp[—e 1, k = Q,

for all y with ky<1. Essentially, F is well approximated by the


GPD (with some fixed k) as u+x0 lt and °n1Y if F lies in the max-
imum domain of attraction of an extreme value distribution (with
THRESHOLD METHODS FOR SAMPLE EXTREMES 525

the same k).

Thiz iezulfi metivates our next step: as an approximation, we


assume a u as the GPD with unknown parameters 0 and k. we
shall not consider here how this assumption might be checked in
Ptatticei it is an imP°ftaht queetien, however, and we refer to
DeV15°n'5 [7] PaPer in this Proceddings for discussion of it.

The density function of the GPD is

qty I 0,k) = o" <1 - ky/0)1/k, k ¢ 0,


1
0 eXp(-y/0). k = 0.
defined when Y>0, ky/0 <1. The elements of the Fisher information
matrix are given by
\
8
E{(§5" 1n 9(Y lo.k) )2 } = 1/{O2(1-2k)},
3
E((55"1n 9(Y|0.k))(%;- 1h g(Y|O,k))} = ~1/{0(1—k)(1-2k)},
E{(g§' 1n 9(Y |0.k) )2 } = 2/{(1-k)(1-2k)},
provided k<1/2; for k21/2 these expectations are infinite.

For k<1/2, it can be shown that the classical asymptotic


results for maximum likelihood estimation are valid: for a sample
of size r, there exists a solution (5,k) of the likelihood
equations which as rfw is consistent, efficient and normal with
covariance matrix r M, where
2 2
M = 0 (1 - k) 2/(1-k) 1/{0(1—k)}
2
1/{U(1-k)} 1/o
is the inverse of the Fisher information matrix.

For k21/2, this result is false and the correct result is very
complicated. For k=1/2 we have
A _ A -1 2
k - k = 0p(r 1/2), 0 - 0 = 0p((r 1h r) / )
with the terms asymptotically independent and normal. For 1/2<k<1
we have
~ - ~ —k
]~;—k=O(I1/2], O-O=O(1' )1
P P
and the asymptotic distribution of 0 is the sage as that gitainefil
by Woodroofe [34], which is not normal. For k/1 and rlsu icie H y
large, the likelihood has no local maximum but has 6 51h9u1aI1tY
‘_

EM RJnSMHH

at the endpoint of the distribution, i.e. when max(Yi) = 0/k.


Proofs of these are in [24]. Fortunately, in practice it usually
appears to be the case that k<1/2, so that classical maximum
likelihood is applicable. Davison [7] considers extensions to
cases where O and k are not fixed parameters but depend on
covariates.

We may summarise the method as follows. First, a high


threshold u is chosen and all exceedances above the threshold noted,
The number r of such exceedances has a binomial distribution from
which inferences about the exceedance probability p=1—F(u) may be
made. Conditionally on r, the exceedance values are independent
with a distribution which we take to be GPD, and whose parameters
O and k may be estimated by maximum likelihood, at least so long
as k<1/2. From these assumptions, the return level QN defined by
the equation F(QN) = 1-1/N is given by
1- (N Vk 1<¢0,
QN=u+(‘I ———e--k, (3.1)
provided QN > u. The m.l.e. QN may therefore be constructed by
substituting the estimated values p,k and O in (3.1), and its
variance may be estimated approximately as VTZV, where
V =L%QN' §QN '_§QN)T is the gradient of QN at the m.l.e.,
p 30 Bk p,O,k A A A
and Z is the estimated covariance matrix of (p,O,k).

4. THRESHOLD METHODS FOR GENERAL TIME SERIES

Our intention now is to suggest ways in which the method of


Section 3 might be adapted to general time series sampled at
regular intervals. Examples are series of river levels, wave
heigths and wind speeds, which are typically sampled at hourly
or three—hourly intervals.

The POT method is based just on peaks, rather than a11


exceedances, over the threshold. This is open to the objection
that it may not just be the peaks which are important; in some
applications (e.g. assessing the effects of high winds on build-
ings) the duration of high-level activity may be more important
than the peak value attained. We therefore propose to analyse
the series consisting of all exceedances over the threshold and
the associated exceedance times. To distinguish this from the
POT method, we shall refer to it as the Exceedahces Over Threshole
or EOT method. However, this idea will not be carried to its
logical conclusion here: although we model the exceedances times
by a clustered point process, the exceedance values will be analyse d
as in the POT method, by considering cluster peaks,

For the remainder of the paper we shall model the exceedance


THRESHOLD METHODS FOR SAMPLE EXTREMES 627

times {Ti} as a very simple doubly stochastic process [6,11] whose


Supplementary process is a two—state markov chain. Explicitly,
A(t) is assumed to be a Markov chain with states 0 and 1 and jump
rates A (from 0 to 1] and U; Ceflditionally on A(t) (unobserved),
the exceedance time form a Poisson process with rate mA(t). The
unknown parameters are m,A and u.

The inter-event times Z1=T1, Z =T, - T_ 1 (i>1) are


l 1 1-
independent with a mixed exponential distribution with p.d.f.

HZ] =§C_% l(l1+§°-B)e-Q2 - (u+(o-d)e'BZl. z>0 (4.1)


where a,B are the roots (positive and distinct) of the quadratic
equation

x2 - (A+p+m)x + Am = 0 . (4.2)

This follows from formulae in [11], Pages 40-44. Maximum likeli-


hood estimation of m,A and u is easily performed (numerically)
using these equations.

We also need some means of identifying clusters (so that we


can pick out cluster maxima). Intuitively, T, and T, 1 belong
to the same cluster if A(t) = 1 for T._1 <'t K T,. Jfihder this
definition, the mean number of clusteis per unitltime is
AUQ/{(X+p)(p+m)}. This cannot be applied directly becuase A is
unobserved, so we propose the following empirical rule: assign
T, and T, to the same cluster if and only if Z, = T -T, < 2*,
Uhder thislrule, the mean number of clusters perlunitltimelis
f”*f(z)dz / faEf(z)dz. (These results for mean numbers of clusters
fgllow from eiementary renewal theory.) The proposed procedure
is to choose 2* so that the mean numbers of clusters under the
two rules are the same; this is then determined by setting

1:; f(z)dz = {IE zf(z)dz] luw/{(X+U)(u+¢)] = u/<p+d1. 14.31


The proposed method is therefore as follows:

1. From the exceedance times {T,}, the intervals {Z.} are


obtained and the parameters w,A,p estimated by maximum likelihood.
2. The cutoff point z* is determined by (4.3) and the peak
Values, i.e. maximum values of Y, within each cluster, obtained.
3. The GPD is fitted to thelpeak values and estimates of k,U
derived.

Note that there is no allowance for seasonality in this model.


Ways of handling that are discussed in Section 6. Also the cluster-
iflg model is very simple, e.g. it may well be unreasonable to
assume the Zi's independent, in which case a more complicated model
_|_

628 R. 1.. suns


would be needed.

Su Ppose now we want to calculate the return period associated


with some high level q>u. we define this to be the mean time
between successive peaks above the level q. The mean number of
clusters per unit time, as already noted, is Aum/{(l+u)(H+¢)}.
Therefore, if' T is
' large and A << u, the mean number 0 f C l us t ers
in time [0,T] is approximately Tkm/(u+m). Within each cluster,
the probability of a peak value above q is 1 - G(q-u [ U,k). Thus
the mean number19£ exceedances of q in [0,T] is aPPI°X1mate
' lY
TA¢(]“k(q"u)/U) /(h+m), form whichkit follows that the return
Period is (U+m)(Xm) (1-k(q-u)/O) . This argument has
deliberately been given in a loose and heuristic manner but is
easily made rigorous using renewal theory.

The return level qT associated with a given return period T


is given by

qT = u -'
X -k ,

For k=0, the last expression simplifies to

qT _
- u + 0 1n _'u+m
1_‘_7l<E . (4.5)

Finally,since maximum likelihood estimators of the five


unknown parameters, and the approximate variances and covariances
oftfimeestimators, are available, it is possible to estimate q
. . . . . T
and to obtain approximate confidence limits.

5. EXTREMES OF STATIONARY SEQUENCES

The asymptotic theory of extremes in stationary sequences


was introduced by Watson [29], Berman [2] and Loynes [17], and
extensively developed in a series of papers by Leadbetter; see [16].
As the subject has been surveyed elsewhere in this Proceedings[15],
we give only a brief review here.

For a stationary sequence {E } and an increasing sequence of


threshold levels {u ], Leadbetternintroduced two general conditions,
D(un), and D'(un). 3Condition D(un) is a "tail mixing" which states
that extreme values far apart in tH€ sequence are approximately
independent. Condition D'(u ) is a "short range independence"
condition asserting that thenprobability of two extreme observa-
tions occuring close to each other is asymptotically negligible.
If D and D‘ are satisfied for suitable {u } then the asymptetic
distribution of extreme values from the stationary sequences is
the same as that from an i.i.d. sequence
. . with the same marginal
f
distribution. Moreover, the two—d1mehs10nal point process o
exceedances times and exceedance values converges weakly, under
THRESHOLD METHODS FOR SAMPLE EXTREMES 629

the a pp ro P riate re nermelleetlbh,


' ' '
to a Poisson - this
process in . case

_ Unt°ttu"ete1Y, D‘ is not a reasonable condition for most


environmental time series. Indeed, the clustering of high values
is a major Prattieel Problem. In the case where D is satisfied
but
a onenot definition
D‘, Leadbetter
bein [14]
th t ha ' t roduced the extremal index
8§11D '
' 9 6 is the mean number of exceedances
Pet[°1?5tet (0<3<]]- The idea, though not the name, first appeared
in 20 .

These ideas provide some justification for considering the


sequence of exceedance times as a clustered Poisson process. The
doubly stochastic model we have examined is approximately of this
form if <<U. For this model, the mean number of exceedances per
unit ttme is AQ/(A+U) ~ Am/u. The asymptotic mean number of peaks
per unit time was calculated in Section 4 as Am/(u+m). The
extremal index is the ratio of these, i.e.

3 = 11/(1-l+<p). (5.11
Our procedure therefore provides an estimator of the extremal
index for this model.

6. AN APPLICATION

The data were recorded by the Seven Stones light vessel in


the English Channel. They consist of recordings of significant
wave heights, in metres, taken every three hours by a Shipborne
wave recorder. They were collected between 1968 and 1977 but,
because of several gaps in the data, the total period spanned by
the observations is only seven years. Previous analyses of these
data have been made in [3,4Z5,8]. The analysis here is an exten-
sion of an earlier threshold analysis by Turner [28].

The raw data displayed an obvious annual cyle. To study the


nature of the seasonal variation, the data were split into blocks
of ten days and within-block means and variances calculated} From
the application of standard time-series techniques to these series
Turner [28] concluded that the data {X(t), t=1,2,.--] fellewed
the model

1h x(t) = M + A cos (wt+S) + z(t) (6-1)


with {Z(t)] a stationary series. The constants M and A (recom-
puted for the present study) are found to be 0.710 and 0.402, and
W = 2W/2922 is the frequency corresP°"di“9 to a cycle of period one
Year (2922 observations). The analY5i5 which follows is based O?
the assumptions that the model (6.1) is ¢°rreet- we thetef°re fit
a threshold model to the series Z.
__t

630 R. L. smrrn

The next problem is to choose a threshold. This sheuld be


high enough to justify the assumptions of the model but low enough
to capture a reasonable number of peaks. In the F1005 Studies
Report [18], a number of thresholds are considered so that the
number of peaks varies between one and five e Yeer- our ane1Y$iS
was repeated for_several thresholds so as to examine the sensitiv-
ity of the conclusions to choice of threshold.

For four thresholds u, the model of Section 4 was fitted to


the exceedances of Z over u. Estimated parameter values and their
standard errors are given in Table 1.

TABLE 1

ESTIMATED PARAMETER VALUES FOR FOUR THRESHOLDS

Parameter Threshold U
(i) 1.0 1.1 1.2 1.3
Ne(ii) 295 140 74 36
Np 81 43 27 11
Q 0.46 0.52 0.47 0.50
(0.04) (0.06) (0.08) (0.11)
X 0.0067 0.0036 0.0025 0.0009
(0.0003) (0.0006) (0.0005) (0.0003)
U 0.18 0.23 0.27 0.22
(0.02) (0.04) (0.06) (0.07)
&(iii)
0.28 0.31 0.37 0.31
(0.03) (0.04) (0.06) (0.08)
k 0.12 0.16 -0.02 -0.05
(0.10) (0.13) (0.20) (0.32)
o(iv) 0.17 0.17 0.12 0.12
(0.02) (0.03) (0.03) (o,05)
U(v) 0.15 0.15 0.12 0,12
(0.02) (0.02) (0.02) (0.04)

Standard eryors are shown1in parentheses beneath the estimates


The units of Q , A and u are the time interval between
observations, i.e. three hours.

(i) Ne = Number of exceedances


(ii) N = Number of peaks
... P . q
(iii) Extremal index \ = U/(U+m)
(iv) Assuming k unknown
(v) Assuming k known and equal to zero
‘THRESHOLD METHODS FOR SAMPLE EXTREMES 631
A

It
_ can . be seen that k is close to zero in each case, and the
question arises as to whether the exponential distribution of
exceedance values (k=0) is appropriate in place of the GPD. This
Wes tested by a likelihood ratio test. For u=1.1, twice the
difference in log likelihoods under the two models k=0, k unres-
tricted, is 0.92. Since this statistic has a limiting X2 distri-
_
bution when ke0, . not significant.
this value is . . . _ .
Similar 1 results
were obtained for the other thresholds. Therefore it seems
reasonable to assume k=0 for this data, though this is a point to
which we return later. The estimates and standard errors of 0,
under the assumption k=0, are also shown in Table 1.

We now turn to the computation of return values from the fitted


model. Since we are really interested in the extremes of the
series X rather than Z, the method given in Section 4 is not
adequate.

Suppose that in a single year (of length 2v/w) the exceedance


times are denoted T1,...,TN (where N is random). We calculate the
probability

n(q) = P{1h X(Ti) < q for i=1,...,N)' (6.2)


for q>M+A+u assuming (6.1) and that the exceedances of Z over u
are given by the model of Section 4. Thus H is approximately the
d.f. of the annual maximum of the series 1n X.

Assuming the GPD for excess values we have for q>M+A+u,


P{ih X(Ti) Q q, 1<i<n | T1,...,TN] =
N
n [1-{1-k(q-M-u-Acos(wmi+3))/0}]/k]
i=1
But the process of pp§k_times is approximately Poisson with
intensity Am/(u+m). Therefore, the conditional distribution of
T1,...,TN given N is approximately the distribution of uniform
order statistics over [0,2n/w], so that

P[ln X(Ti) < q, 1<i<N | N} =


[1-1“/w{1-k(q-M-u-Acos(wt))/O]1/k(wV2")dt]N-
0 (6.3)
Taking expectations in (6.3) assuming the Poisson distribution for
N gives

H(q) = eXP[-(2fl/w)(A¢/(u+m))-
" 1/k 6
5 {1-k(q-M-u-Acos t)/U) dt/"]- ( -4)

Fer k=0, the integral in (6.4) simP1itie5 t°


'1

632 R. L. SMITH

H
.(exp(—(q-M—u-Acos t)/O) dt/H = exp(-(q—M-u)/0) I (A/07 Where I0
Ois the modified Bessel function uf order zero [(0.(equati°fl
9.6.16). In this case, the expression for H(q) reduces to the
d.f. of the Gumbel (Type 1 Extreme Value) diStributi°n- If k#0.
the corresponding expression is not one of the classical extreme
value distributions (except when A=0, corresponding to the case
of no seasonal variation) but a mixture of them.

In the case k=0, the n-year return level q I Whi¢h We take


to be defined by H(qn) = 1/n, is given explicitly by

qn = M + u + o ln{n(21T/u))(7\<P/(U+<P)) 10(1)/0) - (6-5)


This expression may be used to construct an estimator qn of qn.
Its standard error may be estimated (approximately) using the
procedure described at the end of Sections 3 and 4. In obtaining
the derivatives of q with respect to the unknown parameters,
we use the relation nIO'(x) = I1(x) ([1]. equation 9.6.27). ~

For k#0, there is no closed form expression for q , which


must therefore be found by numerical interpolation in (6.4).

Estimates of qso are shown in Table 2 for each of the thresh-


old levels. .

TABLE - - 2
VARIOUS ESTIMATES OF Q 0

Source Estimate Standard Error

This paper, u=1.0 2.92 0,05


This paper, u=1.1 2.89 0.07
This paper, u=1.2 2.77 0.08
This paper, u=1.3 2.78 0.12
[28]. u=1.1 (1) 2.74 0.12
[28], u=1.2 (i) 2.73 0.16
[3], (ii) 2.65
[3], (iii) 2.52
[3], (iv) 2.69
[5], (v) 2.73
[8], (vi) 2.56

Estimates are for the natural logarithm of the wave height in metres-
(i) Based on 2.5 years of data
(ii) Gumbel method applied to monthly maximum for March
(iii) Gumbel method applied to annual maxima
(iv) Based on product of estimated c.d.f.' s for monthly maxima
(v) Monthly maxima with sinusoidal location parameter
(vi) Fitted distribution to all values
(Partly adapted from [28])
THRESHOLD METHODS FOR SAMPLE EXTREMES 633

In estimating the standard errors, M and A were treated as known


constants, since their sampling errors are negligible com P ared
with those of the other parameters. Also shown for comparison are
the point estimates obtained in previous studies of this data set.
Turner's [28] analysis differed from the present one in three
respects: (i) he used only two and a half years of the data (ii)
he used only the exponential distribution and not the GPD, (iii)
the peak times were modelled by a simple Poisson process.

Table 1 includes estimates of the extremal index, which


appears to be around 0.3. i.e. the mean cluster size is between
3 and 4.

The GPD did not play a very important role in this analysis
as we took k=0 in the final calculations. It is, however, worth
examining the sensitivity of the final conclusions to this assump-
tion. Using u=1.2, we recomputed qso for several values of k
within two estimated standard errors of zero, keeping the other
parameters fixed (Table 3). It can be seen that statistically
insignificant variations in k produce considerable variations in
q5O, especially when k is negative.

TABLE 3

POINT ESTIMATES OF 9 0 F_qR_vAgIg.Ls_ y1§_i.uEs OF k


(Other parameters held fixed)

k qso
2.52
2.56
2.60
2.67
2.77
- 2.92
- 3.18
_ 3.59
_ C
CDG
@C
C@Q 0 l\JU)
00n0
000 CD
l\J
U0
:5
—-oh
-\' 4.24

This highlights what may well be the Principal difficulty of


this kind of analysis, namely the sensitivity Qf the Concluslons
to the assumptions of the model, espeCial1Y when return PeFi°dS
much longer than the length of the data are inV0lVed- It 15
difficult to say what should be done about this, but we do Suqgest
that it is important to analyse different models and to compare
the conclusions they lead to in any analysis of this sort‘
634 R.L.SMITH

Another important question is whether the very Si@P1e m°de1


(6.1) is really appropriate or whether some more complicated
procedure for handling seasonality is needed. First. let HS
remark that additional sinusoidal terms in (6.1) maY be i"¢1uded
without any change in the methodology. The analYsiS is unchanged
up to (6.4) in which the additional sinusoidal terms must be
included. In that case, when k=0 the integral no longer redu¢e$
to a Bessel function but must be integrated numerically. A more
fundamental problem is whether the residual process Z can really
be assumed stationary - the standard time series tests based on
the autocovariances are only tests of second-order stationarity.
The sensitivity uf the conclusions to the stationary assumption
can be examined by testing for seasonality in the series of
exceedance times and exceedance values. As a useful diagnostic
tOOl. we propose the "total excess statistic" Z(Z(t)-u)+) 061C0-
lated for each month of the data. That is, for each month we add
up all the exceedance values over the threshold. For u=1.0, this
was calculated and the spectral density of the resulting sequence
plotted (Figure 1). It is consistent with a low-order stationary
model, and in particular there is no evidence of residual season-
ality, as might be indicated by a peak near the frequency 1/12
or one of its multiples.

Another important issue is the assumed independence of peak


values. A correlogram of peak values showed no evidence of
correlation, though this is one aspect which could be taken further

In conclusion, if we accept k=0 then, allowing for the diff-


erent estimates in Table 2 and their standard errors, we might
conservatively conclude that q5 lies between 2.5 and 3.0. This
is a very wide range, probably goo wide to be of much practical
use. Taking the variation of k into account would lead to an even
wider range. It is noticeable that the estimated return values
are well above those of Carter and Challenor. Challenor (personal
communication) has suggested that this may be an artefact of the
logarithmic transformation.

Although we have discussed a number of aspects of threshold


models it is clear that there are other aspects needing further
study. Two of these which we have touched on are the choice of
threshold and the handling of seasonal series. Other questions
are the handling of multivariate series and the incorporation of
subsidiary information as might be contained in covariates,

Finally, it may often be the case that calculated return


levels are very sensitive to the precise model adopted, especially
where significant extrapolation is involved. In view of this, it
may be necessary to repeat the analysis under different model
assumptions in order to obtain a realistic assessment of the
accuracy of the conclusions.
THRESHOLD METHODS FOR SAMPLE EXTREMES 635

'-P
.0

r )
1

‘D

-J

._*.*
O

in
1

g-Q

I-

._°?
O
L 0
Q
-E
5-

:1

1 1

GF
URE O-2
L

F’

/ r

0'1
7"’

TVI
DERQ-l _l1-III
Lu-

_J
<1
.,_

EC
fun-

J
Ll..| TOTAL
EXCESSES
TFOR
HRESHOLD
UL |—1—-1—v—1—|—1—1—11-I-11-1-—|~T1"1'r1]1111"'1 I 0'0
U’)
? N q 19 an Q
O o O O I
h

h__
‘as R.LJflflTH

Acknowledgements The wave height data were produced by the


Institute of Oceanographic Sciences. I am Qrateful te Peter_
Challenor for supplying me with the data, and to Anthony Davison
and Martyn Turner for many suggestions. The work was Carried Out
during the tenure of an SERC award.

REFERENCES

1. Abramowitz, M. and Stegun, I. 1964. Hendbeek 05 Mathematical


Functions. National Bureau of Standards, U-$-A-

2. Berman, S.M. 1962. Limiting distributions of the maximum term


in a sequence of dependent random variables. Ann. Math.
Statist. 33, PP. 894-908.

3. Carter, D.J.T. and Challenor, P.G. 1981. Estimating return


values of environmental parameters. Quart. J. Roy. Met.
Soc. 107, pp. 259-266.
4. Carter, D.J.T. 198]. Analysis of Seven Stones wave height
data 1968-1978. Institute of Oceanographic Sciences,
personal communication.

5. Challenor, P.G. 1982. A new distribution for annual extremes


of environmental variables. Quart. J. Roy. Met. Soc.
108, PP. 975-980.

6. Cox, D.R. 1955. Some statistical methods connected with series


of events. J. Roy. Statist. Soc. B 27, pp. 129-164,

7. Davison, A.C. 1984. Modelling excesses over high thresholds,


with an application, in Statistical Extremes and
Applications , this volume.

8. Fortnum, B.C.H. and Tann, H.M. 1977. Waves at Seven stenes


Light Vessel. Institute uf Oceanographic Sciences,
personal communication.

9. Gnedenko, B.V. 1943. Sur la distribution lim't


i e d u terme
maximum d'une serie aleatoire. Ann. Math. 44, 423-453.

10. Grandell, J. 1972. Statistical inference for doubly stochastic


I-‘OlSSOI'1 PIOCQSSGS. In S’COChaStic Processes Stat-

istical Analysis, Theory and Applications, ed. P.A.W.


Lewis, Wiley New York,

11. Grandell, J. 1976. Doubly Stochastic Poisson proceSSeS_


Lecture Notes in Mathematics No. 529, Springer, Berlin.
P

THRESHOLD METHODS FOR SAMPLE EXTREMES 637

12. Hall. P. 1982. On estimating the endpoint of a distribution.


Ann. Statist. 10, pp. 556-568.

13. Hill, B.M. 1975. A simple general approach to inference about


the tail of a distribution. Ann. Statist. 3. PP. 1163-
1174.

14. Leadbetter. M.R. 1982. Extremes and local dependence in


stationary sequences. Preprint 1982 no.5, University of
Copenhagen.

15. Leadbetter, M.R. 1984. Statistical Extremes and Applications,


this volume.

16. Leadbetter, M.R., Lindgren, G. and Rootzen, H. 1983. Extremal


and Related Properties uf Random Sequences and Processes.
Springer, New York.

17. Loynes, R.M. 1965. Extreme values in uniformly mixing station-


ary stochastic processes. Ann. Math. Statist. 36, pp.
993-999.

18. NERC 1975. Flood Studies Report, Vol.1. Natural Environment


Research Council, London.

19. North, M. 1980. Time-dependent stochastic models of floods.


J. Hydraulics Division, A.S.C.E., pp. 649-655.

20. O'Brien, G.L. 1974. The maximum term of uniformly mixing


stationary stochastic processes. 2. Wahr. verv. Geb.
30! pp!

21. Pickands, J. 1975. Statistical inference using extreme order


statistics. Ann. Statist. 3, pp. 119-131.

22. Revfeim, K.J.A. 1982. Seasonal patterns in extreme 1-hr


rainfalls. Wat. Res. Res.

23. Revfeim, K.J.A. 1983. On the analysis of extreme rainfalls.


J. Hydrology.

24. Smith, R.L. 1983. Maximum likelihood estimation in a class of


non-regular cases. Paper in preparation-

25. Smith, R.L. and Weissman, I. 1983. Maximum likelihood estim-


etion of the lower tail of a probability distribution.
Submitted for publigation.

26. Todorovic, P. (1978). Stochastic models of floods. Wet. Res.


R95. 141 PP. 345-356,

¥
R.LJHHTH

Todorovic P 1979. A probabilistic aPPI°aeh te analysis and


prediction of floods. I.S.I. Proceedings-V01-1. PP 113-124

Turner, M.J. 1982. Estimation of the fifty year return "eve


height at Seven Stones. M.Sc. and D.I.C. ReP0rt: DePert-
ment of Mathematics, Imperial College-

Watson, G.S. 1954. Extreme values from samples in m-dePendent


stationary processes. Ann. Math. Statist. 25. PP- 793-300-

Weiss, L. 1971. Asymptotic inference about a density funetiefl


at the end of its range. Nav. Res. L0g- Quart- 18' PP-
)11"114.

weissman, I. 1978. Estimation of parameters and large quan-


tiles, based on the k largest observations. J. Amer.
Statist. Soc. 73. 812-815.

Weissman, I. 1980. Estimation of tail parameters under Type


I censoring. Commun. Statist. Theor. Meth. A9 (11).
pp. 1165-1175.
Weissman, I. 1982. Confidence intervals for the threshold
parameter II: Unknown shape parameter. Comun. Statist.
Theor. Meth. All. pp. 2451-2474

Woodroofe, M. 1974. Maximum likelihood estimation of trans-


lation parameter of a truncated distribution II. Ann.
Statist. 3, 474-468. '

You might also like