Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Analysis of Extreme Values in Education

J Beirlant, Katholieke Universiteit Leuven, Heverlee, Belgium


G Dierckx, Hogeschool-Universiteit Brussel, Brussel, Belgium
K Luwel and P Onghena, Katholieke Universiteit Leuven, Heverlee, Belgium
ã 2010 Elsevier Ltd. All rights reserved.

Goals of Extreme Value Theory one choice (C) condition and two different no-choice
(NC) conditions: (1) In the C-condition participants
Extreme value methodology has been constructed with were free to use either the addition or the subtraction
the specific aim to estimate extreme quantiles and/or strategy, (2) in one NC-condition (i.e., the forced addition
small exceedance probabilities corresponding to rare or FA-condition) all trials had to be solved by means of the
events. Classical applications are the calculation of return addition strategy, and (3) in the other NC-condition (the
periods of floods or other catastrophic events, or the forced subtraction or FS-condition) the use of the subtrac-
prediction of large claims in insurance port-folios. In tion strategy was required on all trials. Thirty-seven stu-
educational studies applications are less prominent but dents at the University of Leuven (Belgium) participated
also here the statistical modeling of rare events can be of in this study. Their mean age was 21 years with ages
interest. We present an application to the modeling of ranging from 18 to 26 years. Each participant was tested
reaction times in a numerosity test when applying the individually and ran three different sessions. The presen-
choice/no-choice method. This method involves testing tation order of the different conditions was counterba-
participants under two types of conditions: (1) a choice lanced over participants with the important restriction
condition in which participants can freely choose which that C-condition was always presented first. As a conse-
strategy to use and (2) several no-choice conditions in quence all participants were randomly divided over one of
which participants are required to apply a given strategy both presentation orders: C/FA/FS or C/FS/FA.
on all problems. Asking all participants to use a given In Figure 1 the reaction times were plotted as a func-
strategy on all trials in the no-choice conditions excludes tion of the number of blocks (from 1 to 49 blocks) for
possible selection efforts and/or individual preferences, each of the three conditions: (1) FA-condition; (2) FS-
resulting in unbiased estimates of the speed and accuracy condition and (3) C-condition. Some participants show
of the strategies under consideration. It also becomes extreme long reaction times. Whereas some statisticians
possible to investigate participants’ strategy adaptiveness would qualify these as outliers, the right skewness of
by comparing their actual strategy choices under the the reaction times distribution (as a function of the
choice condition with their ideal strategy choices based block size) is always apparent and appears stronger for
on the no-choice performance data. larger block sizes in the FA- and C-conditions, and
The application is taken from Luwel et al. (2003) where for smaller block sizes in the FS-condition. Extreme
the authors wanted to extend the applicability of the value methodology provides models that allow one to
choice/no-choice method toward the domain of numer- model this right-skewness and the largest reaction times.
osity judgment. They aimed at developing a reliable and More specifically, it tries to answer questions such as: what
valid method for measuring the adaptiveness of numer- is, for a given number of blocks, the extreme reaction time
osity judgment strategies, based on the procedure for level that occurs only once in 100 trials.
strategy identification in the individual response-time
patterns. Participants were instructed to determine, as
quickly and accurately as possible, different numerosities The Extreme Value Modeling Approach
of colored blocks that were presented in a 7 by 7 grid on a
The Basic Model
computer screen. This task allows for two main strategies:
an addition strategy in which the different colored blocks Here we present the extreme value theory (EVT) approach
in the grid are added and a subtraction strategy in which for pure random samples X1, X2, . . . , Xn. The population
the number of empty squares is determined and then distribution is assumed to satisfy the domain of attraction
subtracted from the total number of squared in the grid. condition which expresses that for large sample sizes n the
In contrast to the addition strategy which only involves statistical behavior of the maximum of such samples can be
one solution step, the subtraction strategy is a two- well approximated by a nondegenerate distribution which
step strategy and therefore, cognitively more demanding. then is shown to be necessarily an extreme value distribu-
The experiment was performed under three conditions: tion with cumulative distribution function

25
26 Statistics

60 60

50 50
Reaction time

Reaction time
40 40

30 30

20 20

10 10

0 0
0 10 20 30 40 50 0 10 20 30 40 50
(a) Number of blocks (b) Number of blocks

50

40
Reaction time

30

20

10

0
0 10 20 30 40 50
(c) Number of blocks
Figure 1 The reaction time (RT) as a function of the number of blocks for 37 participants: (a) addition; (b) substraction, and (c) choice.

exp½ð1 þ gyÞ1=g  ½1 Tail Estimation


The POT approach to estimate g and s > 0 simply con-
The parameter g, called the extreme value index (EVI), sists of using for instance the maximum likelihood
characterizes the tail decay of the underlying distribution approach when fitting the GPD to the excesses X  t for
F : a positive value of g corresponds to heavy-tailed or those data X which are larger than t. Quite often, the
Pareto-type distributions for which the density decays as a maximum likelihood estimation procedure leads to appro-
power law, while g ¼ 0 typically indicates an exponentially priate estimates g^k and s^k which show a stable behavior as
decreasing tail. Finally, g < 0 indicates a distribution F with a function of the threshold t. We refer to this method as
a finite right endpoint. This general condition is quite the GP-POT approach.
broad so that it does not really lead to restrictions in Most commonly, t is chosen as one of data points itself,
practice. say as the (k + 1)-largest observation, and then one lets k
It can be shown that this framework is equivalent to vary between 2 and the full sample size n. In Figure 2 the
assuming the following: POT estimators for the extreme value index of the RT
values in our case study are shown for each of the three
conditions as a function of k ¼ 2, . . . , 37 with the number of
The POT–GPD assumption blocks equal to 30. Remark that in case of the addition
For large enough threshold values t, the excesses, or peaks over strategy the estimated values are clearly positive, while for
threshold (POTs) X  t (with X > t) approximately follow a the other two conditions values around zero are predomi-
generalized Pareto distribution (GPD) with cumulative distribu- nant. In Figure 3 the POT estimators of g obtained with
tion function 1  ð1 þ sg yÞ1=g . k ¼ 17 are shown for each condition as a function of the
In case g ¼ 0 the GPD is to be interpreted as the number of blocks. Comparing with Figure 1 we remark that
exponential distribution with cumulative distribution the appearance of high outliers in reaction times leads to
function 1  ey/s for y > 0. This discussion makes it predominant positive EVI estimates. Figure 4 shows the
clear that the goal of tail estimation is carried out under a corresponding maximum likelihood estimates of the scale
nonstandard model which involves the choice of a thresh- parameter s. While the POT estimates of the extreme value
old t. Of course, the estimation of the EVI parameter g is index do not seem to show trends with the number of blocks,
an important step toward the goal of tail estimation. the s estimates grow linearly for the addition strategy, while
Analysis of Extreme Values in Education 27

1.0 1.0
POT estimator for g

0.5

POT estimator for g


0.5

0.0
0.0
−0.5
−0.5
−1.0

−1.0
10 20 30 10 20 30
(a) k (b) k

1.0
POT estimator for g

0.5

0.0

−0.5

−1.0
10 20 30
(c) k
Figure 2 POT estimator for the extreme value index of the reaction times as a function of k ¼ 2.37 (37 participants) for 30 blocks for: (a)
addition, (b) subtraction, and (c) choice.

2.0 2.0

1.5 1.5
POT estimator for g

POT estimator for g

1.0 1.0

0.5 0.5

0.0 0.0
−0.5 −0.5
−1.0 −1.0
10 20 30 40 50 10 20 30 40 50
(a) Number of blocks (b) Number of blocks

2.0

1.5
POT estimator for g

1.0

0.5

0.0

−0.5

−1.0
0 10 20 30 40 50
(c) Number of blocks
Figure 3 POT estimator of g using the k ¼ 17 largest RTs out of 37 observations for the extreme value index of the reaction times
as a function of the number of blocks for: (a) addition, (b) subtraction, and (c) choice.
28 Statistics

10 10

8
POT estimator for s

POT estimator for s


6 6

4 4

2 2

0 0
10 20 30 40 50 10 20 30 40 50
(a) Number of blocks (b) Number of blocks

10

8
POT estimator for s

0
0 10 20 30 40 50
(c) Number of blocks
Figure 4 POT estimator of s using the k ¼ 17 largest RTs out of 37 observations for the extreme value index of the reaction times
as a function of the number of blocks for: (a) addition, (b) subtraction, and (c) choice.

they decline linearly for the subtraction strategy. For the Pareto-Type Distributions
C-condition the scale estimates again follow the pattern of
In the special case g > 0, the underlying distribution is
the original data plot as discussed in Figure 1 with a change
member of the class of Pareto-type distributions. In this
point at about 25 blocks.
case the POT–GPD assumption is equivalent to the
Given estimators g^k and s^k , how can one now estimate
assumption that the excesses X/t (rather than X  t,
the exceedance probability px of obtaining an observation
X > t) for large enough t approximately follow a simple
larger than a given large value x? To this end the condi-
Pareto distribution (PD) with cumulative distribution
tional probability of an excess being larger than x (where
function 1  y1/g, y > 1. Again, using the (k + 1)-largest
we restrict attention to the excess data and which is
observation as the threshold t, this leads to the following
computed along the postulated GPD excess model) is to
tail estimators that can be used in case g > 0:
be multiplied with the probability that an observation is
larger than the threshold. For the latter we assume that k x 1=g^
the threshold is situated deep enough in the sample so p^þ
x ¼ ½4
n t
that the sample proportion can be used here as an esti-
mate. This fraction equals k/n if the threshold t equals the Remark that this is completely similar to [2] with the
(k + 1)th largest observation. This leads to GPD replaced by the simple PD and the excess x  t
k x  t 1=g^ replaced by x/t. Also
p^x ¼ 1 þ g^ ½2
n s^  g^
k
Moreover, an extreme quantile qp which denotes the q^þ
p ¼t ½5
np
outcome level which is exceeded with a chosen probability p
(typically smaller than 1/n) can be estimated by equat-
Here maximum likelihood estimation fitting the strict PD
ing [2] to p and solving for x:
based on the excesses X/t (X > t) leads to the Hill (1975)
ðk=ðnpÞÞg^  1 estimator which equals the average of the logarithms of
q^p ¼ t þ s
^ ½3
g^ the excesses.
Analysis of Extreme Values in Education 29

The use of the GPD in the general case g 2 R, respec- each condition the Hill estimators for g > 0 at k ¼ 17 are
tively the PD in case g > 0, as a consequence of theoretical given as a function of the block size.
probabilistic results concerning the distribution of max- From Figures 1–6 the problems arising with application
ima, should of course be validated in practice. One can of extreme value methods become apparent. First the
validate the use of the fitted parametric models through choice of k is quite important. The higher the value k (or,
goodness-of-fit methods. For this purpose for instance the lower the threshold t) the higher the bias of the estima-
quantile plotting techniques can be used. tor since the models are only valid for threshold values
approaching the endpoint of the underlying distribution.
The smaller the k the larger the variance of the estimators
because then these are based on fewer data. Also theoreti-
Quantile Plotting
cally one can show that optimal k values are different for
An alternative view to the above material, which allows different estimators (such as the GP-POT and the Hill
one to support an extreme value analysis graphically, con- estimator). Also when only few data are available, such as
sists of plotting the ordered data against the corresponding is the case here with only 37 observations per number of
theoretical quantiles of the postulated model. For instance blocks and per condition, the methods can be quite sensitive
in the specific case of Pareto-type distributions (g > 0), the and yield different conclusions. In fact, the GP-POT
model can be evaluated through the ultimate linearity of approach here indicates a more nuanced picture con-
a Pareto quantile plot where the logarithm of the j th cerning the signs and values of the EVI estimates, while

smallest observation is plotted against thestandard expo- the positive Hill estimates taken at k ¼ 17 indicate EVI
nential theoretical quantile log 1  nþ1 j
ð1  j  nÞ. values around 0.2 for all numbers of blocks and all condi-
In Figure 5 the Pareto quantile plots for the RTs tions. Also the GP-POT approach offers a more flexible
to count 30 blocks at each of the three conditions are framework offering two parameters in order to model the
shown. These plots confirm the Pareto-type hypothesis exceedances. In Figure 7 we offer the GP quantile plots for
for the corresponding reaction times as indeed above a the RTs corresponding to the counts of 30 blocks based on
certain level linearity becomes apparent. In Figure 6 for k ¼ 17 exceedances for each of the three conditions. From

3.0
4.0
2.8
3.5
2.6
log Xn−j + 1,n
log Xn−j + 1,n

3.0 2.4

2.2
2.5
2.0
2.0
1.8
0 1 2 3 0 1 2 3
(a) log((n +1)/j) (b) log((n +1)/j )

3.2

3.0

2.8
log Xn−j + 1,n

2.6

2.4

2.2

2.0

0 1 2 3
(c) log((n+1)/j )
Figure 5 Pareto QQ plots for the reaction times to count 30 blocks. The data are based on results for 37 participants: (a) addition,
(b) subtraction, and (c) choice.
30 Statistics

1.0 1.0

0.8 0.8

Hill estimator
Hill estimator

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
0 10 20 30 40 50 0 10 20 30 40 50
(a) Number of blocks (b) Number of blocks

1.0

0.8
Hill estimator

0.6

0.4

0.2

0.0
0 10 20 30 40 50
(c) Number of blocks
Figure 6 Hill estimator values when using the k ¼ 17 largest RTs out of 37 observations for the extreme value index of the reaction
times as a function of the number of blocks: (a) addition, (b) subtraction, and (c) choice.
Quantiles of generalized Pareto
Quantiles of generalized Pareto

20
6

15
4
10

2
5

0 0
0 10 20 30 40 50 0 2 4 6 8 10
(a) Excesses over treshold of reaction time (b) Excesses over treshold of reaction time

8
Quantiles of generalized pareto

0
0 2 4 6 8 10 12
(c) Excesses over treshold of reaction time
Figure 7 Generalized Pareto QQ plots for the excesses of the reaction times to count 30 blocks over the threshold corresponding to
k ¼ 17. The estimated g and s values for k ¼ 17 are used: (a) addition ðg^¼ 0:395; s^¼ 4:186Þ, (b) subtraction ðg^¼ 0:131; s^¼ 2:048Þ,
and (c) choice ðg^¼ 0:125; s^¼ 2:214Þ.
Analysis of Extreme Values in Education 31

10 using the GPD as an excess model with s substituted by


the estimated regression function allows one to estimate a
8 high quantile curve as a function of the number of blocks
POT estimator for σ

for the addition condition. In Figure 8 the estimated


6 linear s function is plotted with the original simple s
estimates as shown in Figure 4, next to the extreme
4
quantile curve [3] corresponding with p ¼ 0.01 with the
2 original data under the addition strategy.

0
(a) 10 20 30 40 50 A Literature Review

60 In the last few decades extreme value methods in regres-


sion, multivariate, and time series settings have been
50 developed and studied. More refined models designed
Reaction time

40 to account for temporal dependence and multivariate


data in extreme value analysis include Poisson process
30
models, Markov chain models, and extreme value copulas.
20 Another important issue is the choice of the threshold
defining the excesses to which to fit the (generalized) PD.
10
Recently developed techniques offer a solution through
0 stabilizing the estimates over a large range of thresholds.
0 10 20 30 40 50
(b) Number of blocks See also: Exploratory Data Analysis; Goodness-of-Fit
Figure 8 (a) The estimated linear s-curve and (b) the 0.99 Testing; Nonparametric Statistical Methods; Order
quantile curve based on regression modeling with the original Statistics.
data for the addition strategy data.

Bibliography
this we infer that the GP model appears accurate apart from
the largest observation for each of the cases. Davison, A. C. and Smith, R. L. (1990). Models for exceedances over
high thresholds (with comments). Journal of the Royal Statistical
Society, Series B 52, 393–442.
Hill, B. M. (1975). A simple general approach to inference about the tail
Extensions of a distribution. Annals of Statistics 3, 1163–1174.
Luwel, K., Verschaffel, L., Onghena, P., and De Corte, E. (2003).
Analysing the adaptiveness of strategy choices using the choice/no-
Our case study motivates the use of regression models choice method: The case of numerosity judgement. European
which allow one to retain the flexible modeling of the Journal of Cognitive Psychology 15, 511–537.

underlying distribution tails as proposed by the maximum


domain of attraction framework. This then allows one to
use more data points in order to estimate only a limited Further Reading
number of parameters, and hence keep a decent number
of degrees of freedom. Davison and Smith (1990) provide Beirlant, J., Goegebeur, Y., Segers, J., and Teugels, J. (2004). Statistics
of Extremes – Theory and Applications. Wiley Series in Probability and
an interesting regression modeling approach within the Statistics. New York: Wiley.
GP-POT modeling framework. Coles, S. G. (2001). An Introduction to Statistical Modelling of Extreme
In the case of the addition strategy, we used maximum Values. London: Springer.
de Haan, L. and Ferreira, A. (2006). Extreme Value Theory: An
likelihood estimation to fit the GP distribution with para- Introduction. New York: Springer.
meters g and s ¼ s0 + s1  (the number of blocks). For Reiss, R. D. and Thomas, M. (2001). Statistical Analysis of Extreme
each number of blocks the 17 largest data points were Values. Basel: Birkhäuser Verlag.
Weissman, I. (1978). Estimation of parameters and large quantiles
used so that we used 49  17 excess data. This leads to the based on the k-largest observations. Journal of the American
estimates g^¼ 0:096; s^0 ¼ 0:167, and s^1 ¼ 0:098. Then Statistical Association 73, 812–815.

You might also like