Journal Soche Vol 3.1

Volume 3 - Number 1 - April 2012
Volume 3 - Number 1 - April 2012

V
o
l
u
m
e

3

(
1
)

A
p
r
i
l

2
0
1
2

Contents
Reinaldo Arellano-Valle
A message from the editor-in-chief 1
Carmen Batanero and Carmen Daz
Training school teachers to teach probability: refections and challenges 3
Alessandra Guglielmi, Francesca Ieva, Anna Maria Paganoni
and Fabrizio Ruggeri
A Bayesian random-effects model for survival probabilities after acute
myocardial infarction 15
Christophe Chesneau and Nargess Hosseinioun
On the wavelet estimation of a function in a density model with
non-identically distributed observations 31
Rahim Mahmoudvand and Mohammad Zokaei
On the singular values of the Hankel matrix with application in singular
spectrum analysis 43
Luis Gustavo Bastos Pinho, Juvncio Santos Nobre
and Slvia Maria de Freitas
On linear mixed models and their infuence diagnostics
applied to an actuarial problem 57
Lutemberg Florencio, Francisco Cribari-Neto and Raydonal Ospina
Real estate appraisal of land lots using GAMLSS models 75
Arabin Kumar Dey and Debasis Kundu
Discriminating between the bivariate generalized exponential
and bivariate Weibull distributions 93
Aims
The Chilean Journal of Statistics (ChJS) is an ocial publication of the Chilean Statistical Society (www.soche.cl).
The ChJS takes the place of Revista de la Sociedad Chilena de Estadstica, which was published during 1984-2000.
The ChJS covers a broad range of topics in statistics, including research, survey and teaching articles, reviews, and
material for statistical discussion. The ChJS editorial board plans to publish one volume per year, with two issues
in each volume. On some occasions, certain events or topics may be published in one or more special issues prepared
by a guest editor.
Editor-in-Chief
Reinaldo Arellano-Valle Ponticia Universidad Cat olica de Chile.
Executive Editor
Vctor Leiva Universidad de Valparaso, Chile.
Coordinating Editors
Wilfredo Palma Ponticia Universidad Cat olica de Chile.
Ronny Vallejos Universidad Tecnica Federico Santa Mara, Chile.
Founding Editor
Guido del Pino Ponticia Universidad Cat olica de Chile.
Production Editor
Felipe Osorio Universidad Tecnica Federico Santa Mara, Chile.
Advisory Editors
Narayanaswamy Balakrishnan McMaster University, Canada.
George Christakos San Diego State University, USA.
Carles Cuadras Universidad de Barcelona, Spain.
Eduardo Gutierrez-Pe na Universidad Nacional Autonoma de Mexico.
Fabrizio Ruggeri CNR-IMATI, Italy.
Pranab K. Sen University of North Carolina at Chapel Hill, USA.
Associate Editors
Munir Ahmad National College of Business Administration & Economics, Pakistan.
Jose M. Angulo Universidad de Granada, Spain.
Carmen Batanero Universidad de Granada, Spain.
Ionut Bebu Georgetown University Medical Center, USA.
Jose M. Bernardo Universidad de Valencia, Spain.
Heleno Bolfarine Universidade de Sao Paulo, Brazil.
Marcia Branco Universidade de Sao Paulo, Brazil.
Oscar Bustos Universidad Nacional de Cordoba, Argentina.
Alicia Carriquiry Iowa State University, USA.
Yogendra Chaubey Concordia University, Canada.
Gauss Cordeiro Universidade Rural de Pernambuco, Brazil.
Francisco Cribari-Neto Universidade Federal de Pernambuco, Brazil.
Francisco Cysneiros Universidade Federal de Pernambuco, Brazil.
Mario de Castro Universidade de Sao Paulo, Sao Carlos, Brazil.
Jose A. Daz-Garca Universidad de Granada, Spain. Univ. Autonoma Agraria Antonio Narro, Mexico.
Jose Garrido Concordia University, Canada.
Marc G. Genton Texas A&M University, USA.
Patricia Gimenez Universidad Nacional de Mar del Plata, Argentina.
Graciela Gonzalez Faras CIMAT, Mexico.
Nikolai Kolev Universidade de Sao Paulo, Brazil.
Brunero Liseo Universit` a di Roma, Italy.
Shuangzhe Liu University of Canberra, Australia.
Rosangela H. Loschi Universidade Federal de Minas Gerais, Brazil.
Ana C. Monti Universit` a degli Studi del Sannio, Italy.
Leo Odongo Kenyatta University, Kenya.
Carlos D. Paulino Instituto Superior Tecnico, Portugal.
Fernando Quintana Ponticia Universidad Cat olica de Chile.
Josemar Rodrigues Universidade Federal de Sao Carlos, Brazil.
Jose M. Sarabia Universidad de Cantabria, Spain.
Chilean Journal of Statistics
Volume 3, Number 1
April 2012
ISSN: 0718-7912 (print)/ISSN 0718-7920 (online)
c Chilean Statistical Society Sociedad Chilena de Estadstica
http://www.soche.cl/chjs
Chilean Journal of Statistics Volume 3 Number 1 April 2012
Contents
Reinaldo Arellano-Valle
A message from the editor-in-chief 1
Carmen Batanero and Carmen Daz
Training school teachers to teach probability: reections and challenges 3
Alessandra Guglielmi, Francesca Ieva, Anna Maria Paganoni and Fabrizio Ruggeri
A Bayesian random-eects model for survival probabilities after acute
myocardial infarction 15
Christophe Chesneau and Nargess Hosseinioun
On the wavelet estimation of a function in a density model with
non-identically distributed observations 31
Rahim Mahmoudvand and Mohammad Zokaei
On the singular values of the Hankel matrix with application in singular
spectrum analysis 43
Luis Gustavo Bastos Pinho, Juvencio Santos Nobre and Slvia Maria de Freitas
On linear mixed models and their inuence diagnostics
applied to an actuarial problem 57
Lutemberg Florencio, Francisco Cribari-Neto and Raydonal Ospina
Real estate appraisal of land lots using GAMLSS models 75
Arabin Kumar Dey and Debasis Kundu
Discriminating between the bivariate generalized exponential
and bivariate Weibull distributions 93
Vol. 3, No. 1, April 2012, 12
FIFTH ISSUE
A message from the Editor-in-Chief
The fth issue of the Chilean Journal of Statistics contains the following seven articles
that address various interesting topics related to statistics and probability:
(i) The rst article is authored by Carmen Batanero and Carmen Daz. Dr. Batanero
has a wide recognition around the world for her contributions in the eld of statistics
education. In this article, the authors introduce some reections on the importance
of training school teachers to teach probability. They analyze the reasons why
the teaching of probability is dicult for mathematics teachers. In addition, they
describe the contents needed in the preparation of teachers to teach probability and
suggest possible activities to carry out this training.
(ii) The second article is authored by Alessandra Guglielmi, Francesca Ieva, Anna
M. Paganoni and Fabrizio Ruggeri. Dr. Ruggeri has an extensive experience in
the analysis of survival and reliability and he is internationally recognized due to
his outstanding contributions in Bayesian statistics. In this article, the authors
propose a Bayesian model of random eects for survival probabilities after acute
myocardial infarction. They present a case-study by applying a Bayesian hierarchical
generalized linear model to analyze a real data set on patients with myocardial
infarction diagnosis. In particular, they obtain posterior estimates of the model
parameters (regression and random eects parameters) through a MCMC algorithm.
Some issues about model tting are also discussed through the use of predictive tail
probabilities and Bayesian residuals.
(iii) The third article is authored by Christophe Chesneau and Nargess Hosseinioun. Dr.
Chesneau is a highlighted researcher on the topic of the article. In this article, the
authors conduct an interesting study on nonparametric density estimation. They
investigate the estimation of a common but unknown function associated with a
density function of non-identically distributed observations by means of the powerful
tool of the wavelet analysis. In order to do this, they construct a new linear wavelet
estimator and study its performance for independent and dependent data. Then,
in the independent case, they develop a new adaptive hard thresholding wavelet
estimator and prove that it attains a sharp rate of convergence.
(iv) The fourth article is authored by Rahim Mahmoudvand and Mohammad Zokaei.
In this article, the authors provide applications of the Hankel matrix in spectral
analysis. The aim of their work is to obtain some theoretical properties of the
singular values of the Hankel matrix that can be used directly for choosing proper
values of the two parameters of the singular spectrum analysis.
ISSN: 0718-7912 (print)/ISSN: 0718-7920 (online)
2 R.B. Arellano-Valle
(v) The fth article is authored by Luis Gustavo Bastos Pinho, Ju vencio Santos Nobre
and Slvia Maria de Freitas. In this article, the authors consider linear mixed models
and diagnostic tools for statistical analysis of some practical actuarial problems.
Their idea is based on the fact that the linear mixed models are an alternative to
traditional credibility models. Thus, considering that the main advantage of linear
mixed models is the use of diagnostic methods, they consider that these methods
may help to improve the model choice and to identify outliers or inuential subjects,
which deserve better attention by the insurer.
(vi) The sixth article is authored by Lutemberg Florencio, Francisco Cribari-Neto and
Raydonal Ospina. Dr. Cribari-Neto is a very recognized Brazilian researcher with
excellent academic credentials. In this article, the authors perform real estate
appraisal using a class of statistical models called generalized additive models for
location, scale and shape (GAMLSS). By means of an empirical analysis, they show
that the GAMLSS models seem to be more appropriate for estimation of the hedonic
prices function than the regression models currently used to that end.
(vii) The seventh article of this issue is due to Arabin Kumar Dey and Debasis Kundu. Dr.
Kundu is a highly productive Indian researcher. In this article, the authors study the
discrimination between the bivariate generalized exponential and bivariate Weibull
distributions. In order to do this, they use the dierence of the respective maximized
log-likelihood functions, for which they determine the asymptotic distribution of the
corresponding test statistic and calculate the associated asymptotic probability of
correct selection. Their work is nished with numerical illustrations of the eective-
ness of the propose methodology.
Finally, I take this opportunity to give my most sincere thanks to our Executive Editor,
Professor Vctor Leiva, for his constant, profound/deep and noble commitment with the
edition of each issue of our journal.
Reinaldo B. Arellano-Valle
1
Editor-in-Chief
1
Departamento de Estadstica, Facultad de Matematicas, Ponticia Universidad Catolica de Chile, Apartado Codigo
Postal: 7820436, Santiago, Chile. Email: reivalle@mat.puc.cl
Vol. 3, No. 1, April 2012, 313
Statistics Education
Teaching Paper
Training school teachers to teach probability:
reections and challenges
Carmen Batanero
1,
and Carmen Daz
2
1
Departamento de Didactica de la Matematica, Universidad de Granada, Spain
2
Departamento de Psicologa, Universidad de Huelva, Spain
(Received: 31 August 2009 Accepted in nal form: 01 November 2009)
Abstract
Although probability is today part of the mathematics curricula for primary and secon-
dary schools in many countries, the specic training to teach probability is far from
being an universal component of pre-service courses for mathematics teachers responsi-
ble of this training. In this paper, we analyse the reasons why the teaching of probability
is dicult for mathematics teachers. In addition, we describe the contents needed in the
preparation of teachers to teach probability and suggest possible activities to carry out
this training.
Keywords: Teaching probability School level Training of mathematics teachers.
Mathematics Subject Classication: Primary 97A99 Secondary 60A05.
1. Introduction
The reasons for including probability in schools have been repeatedly highlighted over the
past years; see, e.g., Gal (2005), Franklin et al. (2005) and Jones (2005). These reasons
are related to the usefulness of probability for daily life, its instrumental role in other dis-
ciplines, the need for a basic stochastic knowledge in many professions, and the important
role of probability reasoning in decision making. Students will meet randomness not only
in the mathematics classroom, but also in biological, economic, meteorological, political
and social activities (games and sports) settings. All these reasons explain why probability
has recently been included in the primary school curriculum in many countries since very
early ages and why the study of probability continues later through secondary and high
school and universities studies.
Corresponding author. Departamento de Didactica de la Matematica. Facultad de Ciencias de la Educacion. Uni-

versidad de Granada. Campus de Cartuja. 18071 Granada. Espa na. Phone: (34)(958)243949. Fax (34)(958)246359.
Email: batanero@ugr.es (C. Batanero), carmen.diaz@dpsi.uhu.es (C. Daz)
4 C. Batanero and C. Daz
Changes in what is expected in the teaching of probability and statistics do not just
concern the age of learning or the amount of material, but also the approach to teaching.
Until recently, the school stochastic (statistics and probability) curriculum was reduced
to a formula-based approach that resulted in students who were ill prepared for tertiary
level statistics and adults who were statistically illiterate. The current tendency even for
primary school levels is towards a data-orientated teaching of probability, where students
are expected to perform experiments or simulations, formulate questions or predictions,
collect and analyse data from these experiments, propose and justify conclusions and
predictions that are based on data; see, e.g., NCTM (2000), Parzysz (2003) and MEC
(2006a,b). As argued in Batanero et al. (2005), these changes force us to reect on the
teaching of chance and probability.
The importance of developing stochastic thinking and not just stochastic knowledge
in the students is being emphasized in many curricula. Indeed, some authors argue that
stochastic reasoning is dierent from mathematical reasoning, both of them being essential
to modern society and complementing each other in ways that strengthen the overall
mathematics curriculum for students; see Scheaer (2006).
Changing the teaching of probability in schools will depend on the extent to which we
can convince teachers that this is one of the most useful themes for their students, as well
as on the correct preparation of these teachers. Unfortunately, several authors agree that
many of the current programmes do not yet train teachers adequately for their task to
teach statistics and probability; see, e.g., Begg and Edwards (1999), Franklin and Mew-
born (2006), Borim and Coutinho (2008) and Chick and Pierce (2008). Even when many
prospective secondary teachers have a major in mathematics, they usually study only
theoretical (mathematical) statistics and probability in their training. Few mathemati-
cians receive specic training in applied statistics, designing probability investigations or
simulations, or analysing data from these investigations. These teachers also need some
training in the pedagogical knowledge related to the teaching of probability, where general
principles that are valid for geometry, algebra or other areas of mathematics cannot always
be applied. The situation is even more challenging for primary teachers, few of whom have
had suitable training in either theoretical or applied probability, and traditional introduc-
tory statistics courses will not provide them with the didactical knowledge they need; see
Franklin and Mewborn (2006).
Research in statistics education shows that textbooks and curriculum documents pre-
pared for primary and secondary teachers might not oer enough support. Sometimes
they present too narrow a view of concepts (for example, only the classical approach to
probability is shown). In addition, applications are at other times restricted to games of
chance. Finally, in some of them the denitions of concepts are incorrect or incomplete;
see Ca nizares et al. (2002). There are also exceptional examples and experiences of courses
specically directed to train teachers in dierent countries some of them based on theo-
retical models of how this training should be carried out; see, e.g., Kvatinsky and Even
(2002), Batanero et al. (2004) and Gareld and Everson (2009). Although evaluation of
the success of such courses is in general based on small samples or subjective data, they
do provide examples and ideas for teacher educators.
The aim of this paper is to reect on some specic issues and challenges regarding the
education of teachers to teach probability at school level. In Section 2, we analyse the
specic features of probability and the dierent meanings of this concept that should be
taken into account when teaching probability at school level. In Section 3, we summarise the
scarce research related to teachers beliefs and knowledge as regards probability, which are
not always adequate. In Section 4, we then discuss possible activities that can contribute
to the education of teachers to teach probability. Finally, in Section 5, we conclude this
work with some personal recommendations.
Chilean Journal of Statistics 5
2. The Nature of Probability
A main theme in preparing teachers is discussing with them some epistemological prob-
lems linked to the emergence of probability, because this reection can help teachers to
understand the students conceptual diculties in problem solving. Probability is a young
area and its formal development was linked to a large number of paradoxes, which show
the disparity between intuition and mathematical formalization in this eld; see Borovcnik
and Peard (1996). Counterintuitive results in probability are found even at very elemen-
tary levels (for example, the fact that having obtained a run of four consecutive heads
when tossing a coin does not aect the probability that the following coin will result in
heads is counterintuitive). Diculty at higher levels is also indicated by the fact that even
though the Kolmogorov axioms were generally accepted in 1933, professional statisticians
still debate about dierent views of probability and dierent methodologies of inference.
Another dierence is reversibility. In arithmetic or geometry, an elementary operation
(like addition) can be reversed and this reversibility can be represented with concrete
materials. This is very important for young children, who still are very linked to concrete
situations in their mathematical thinking. For example, when joining a group of two apples
with another group of three apples, a child always obtain the same result (ve apples).
However, if separating the second set from the total, he/she always returns to the original
set, no matter how many times this operation is repeated. These experiences are very
important to help children progressively abstract the mathematical structure behind them.
In the case of random experiment, we obtain dierent results each time the experiment is
carried out and the experiment cannot be reversed (we cannot get the rst result again
when repeating the experiment). This makes the learning of probability comparatively
harder for children.
Of particular relevance for teaching probability are the informal ideas that children and
adolescents assign to chance and probability before instruction and that can aect subse-
quent their learning. As an example, Truran (1995) found substantial evidence that young
children do not see random generators such as dice or marbles in urns as having constant
properties and consider a random generator has a mind of its own or may be controlled
by outside forces. There is also evidence that people at dierent ages maintain probability
misconceptions that are hard to eradicate with only a formal teaching of the topic; see
Jones et al. (2007). Even though simulation or experimentation with random generators,
such as dice and coins have a very important function in stabilizing childrens intuition
and in materializing probabilistic problems, these experiences do not provide the key to
how and why the problems are solved. It is only with the help of combinatorial schemes
or tools like tree diagrams that children start to understand the solution of probabilistic
problems; see Fischbein (1975). This indicates the complementary nature of classical and
frequentist approaches to probability.
Another reason for diculty in the eld of probability is that the meaning of some
concepts is sometimes too tied to applications. For instance, although independence is
mathematically reduced to the multiplicative rule, this denition does not include all the
causality problems that subjects often relate to independence nor always serve to decide
if there is independence in a particular experiment.
2.1 Meanings of probability
Dierent meanings also are linked to the concept of probability, which depends on the
applications of this concept in real situations and that are relevant in the teaching of the
topic, such as:
Intuitive meaning. Ideas related to chance and probability and games of chance
emerged very early in history in many dierent cultures, as they appear in young
children. These conceptions were too imprecise, so that we need the fundamental idea
of assigning numbers to uncertain events to be able to compare their likelihood and
thus being able to apply mathematics to the wide world of uncertainty. Hacking (1975)
indicated that probability had a dual character since its emergence. A statistical side
was concerned with the stochastic rules of random processes, while the epistemic side
views probability as a degree of belief. This duality was present by many of the authors
who helped to develop the theory of probability. For example, while Pascals solution
to games of chance reects an objective probability, his argument to support belief in
the existence of God presents probability as a personal degree of belief.
Classical meaning. The rst probability problems were linked to games of chance,
where equiprobability is natural in many cases. For this reason, Laplace suggested
reducing all the events in a random experiment to a certain number of equally probable
possible cases and considered that probability is simply a fraction whose numerator is the
number of favourable cases and whose denominator is the number of all cases possible;
see Laplace (1985, p. 28). This denition was criticised because although equiprobability
is clear when playing a chance game, it can hardly be found apart from games of chance.
Frequentist meaning. The convergence of frequencies for an event, after a large
number of identical trials of random experiments, had been observed over several cen-
turies. Bernoullis proof that the stabilized value approaches the classical probability,
was interpreted as a conrmation that probability was an objective feature of random
events. Given that stabilized frequencies were observable, they can be considered as
approximated physical measures of this probability. However, the frequentist approach
does not provide the exact value of the probability for an event and we cannot nd
an estimate of the same when it is impossible to repeat an experience a very large
number of times. It is also dicult to decide how many trials are needed to get a good
estimation for the probability of an event; see Batanero et al. (2005).
Subjective meaning. Bayes formula permitted the nding of the probabilities of
various causes when one of their consequences is observed. The probability of such a
cause would thus be prone to revision as a function of new information and would lose
its objective character postulated by the above conceptions. Keynes, Ramsey and de
Finetti described probabilities as personal degrees of belief, based on personal judgment
and information about experiences related to a given outcome. They suggested that the
possibility of an event is always related to a certain system of knowledge and is thus
not necessarily the same for all people. The fact that repeated trials were no longer
needed was used to expand the eld of applications of probability theory. However, the
controversy about the scientic status of results that depends on personal judgments
still remains.
Mathematical meaning. Throughout the 20th century, dierent mathematicians con-
tributed to the development of the mathematical theory of probability. Borels view of
probability as a special type of measure was used by Kolmogorov, who applied sets and
measure theories to derive a satisfactory set of axioms, which was accepted by dier-
ent schools regardless of their philosophical interpretation of the nature of probability.
Probability theory proved its eciency in many dierent elds, but the particular mod-
els used are still subjected to heuristic and theoretical hypotheses, which need to be
evaluated empirically.
When teaching probability, these dierent meanings of probability should be progressively
taken into account, starting by the students intuitive ideas of chance and probability and
subjective view of probability as a degree of belief. In order to connect statistics and proba-
bility, some curricula (e.g., in France) suggest implementing an experimental approach to
probability, through experiments and simulations with the purpose of preparing students
to understand the law of large numbers and to grasp the connections between the notions
of relative frequency and probability. In simple chance games where equiprobability is a
reasonable assumption, the classical approach will be useful. Since understanding is a con-
tinuous constructive process, students should progressively acquire and relate the dierent
meanings of the concept and at the nal stage reach to the mathematical formalism and
axiomatic.
In summary, probability is dicult to teach, because we should not only present dierent
probabilistic concepts and show their applications, but we have to go deeper into wider
questions, consisting of how to interpret the meaning of probability, how to help students
develop correct intuitions in this eld and deal with controversial ideas, such as causality.
3. Training Teachers to Teach Probability
Although teachers do not need high levels of mathematical knowledge, they do require a
profound understanding of the basic mathematics we teach at school level, including a deep
grasp of the interconnections and relationships among dierent aspects of this knowledge
(for example, understanding the dierent meanings associated with probability); see Ma
(1999). In addition, teachers need good attitudes towards the correct beliefs about a topic,
and a good professional knowledge for teaching. Next, we analyse each of these components
for the case of probability.
3.1 Teachers attitudes and beliefs
Teachers attitudes and beliefs about how a topic should be taught are key factors to
assure success of curricular reforms, since that these may be transmitted to the students.
Steinbring (1990) suggested that it is important for mathematics teachers to understand
the particular nature of the stochastic knowledge. While traditional mathematics teaching
is based on a hierarchical and cumulative amount of concepts, which are learnt in a linear
sequence, stochastic knowledge is more complex so that systemic and stochastic problems
must include much more interpretative activities than other areas of mathematics. For
instance, to understand what a random variable is, a person needs to assume a model for
randomness, a concept linked to many philosophical interpretations. Moreover, teachers
also have personal views of what are important instructional contents and goals, so that the
teaching that a student receives for a similar level and curriculum might dier considerably
depending on whether a teacher has a static view versus a dynamic view of mathematics, or
whether the teacher prefers to teach formal mathematics versus mathematical applications;
see Eichler (2008). In summary, it is important to promote in the teacher the view that
the stochastic knowledge has its own specicity and should be linked to applications in
order to develop the teachers self-competence to teach probability and statistics. At the
same time, we have to produce conditions for integrating the stochastic knowledge in the
school, while widening the teachers views about mathematics.
3.2 Teachers probabilistic knowledge
Some of the activities that teachers regularly engage involve mathematical reasoning and
thinking. Such activities consider guring out what students know, choosing and manag-
ing representations of mathematical ideas, selecting and modifying textbooks and deciding
among alternative courses of action; see Ball et al. (2001, p. 243). Consequently, teachers
instructional decisions in the teaching of probability are dependent on the teachers prob-
abilistic knowledge. This is cause for concern when paired with evidence that mathematics
teachers, especially at the primary school level, tend to have a weak understanding of
probability. For example, the study by Begg and Edwards (1999) found that only about
two-thirds of in-service and pre-service primary school teachers understood equally prob-
able events and very few understood the concept of independence. Batanero et al. (2005)
analysed results from an initial assessment based on sample of 132 pre-service teachers
in Spain that showed they frequently have three probabilistic misconceptions: represen-
tativeness, equiprobability and the outcome approach. In a research conducted by Borim
and Coutinho (2008), the following results were obtained. First, secondary school teach-
ers predominant reasoning about variation in a random variable was verbal, which did
not allow these teachers to teach their students the meaning of measures such as standard
deviation, restricting them to the teaching of algorithms. Second, none of the teachers
integrated process reasoning, which would relate the understanding of mean, deviations
from the mean, the interval of k standard deviations from the mean and the density es-
timation of frequency in that interval. Canada (2008) examined how pre-service teachers
reasoned about distributions as they compared graphs of two data sets and found that
almost 35% of the sample found no real dierence when average was similar but spread
was quite dierent.
Few teachers have prior experience with conducting probability experiments or simula-
tions and may have diculty implementing an experimental approach to teaching proba-
bility; see Stohl (2005). In a research conducted by Lee and Hollebrands (2008), although
the participant teachers engaged students in investigations based on probability experi-
ments, they often missed opportunities for deepening students reasoning. Teachers ap-
proaches to using empirical estimates of probability did not foster a frequentist conception
of probability as a limit of a stabilized relative frequency after many trials. Teachers almost
exclusively chose small samples sizes and rarely pooled class data or used representations
supportive of examining distributions and variability across collections of samples so they
failed to address the heart of the issue.
3.3 Teachers professional knowledge
Wide probability knowledge, even when essential, is not enough for teachers to be able
to teach probability. As argued by Ponte and Chapman (2006), we should view teachers
as professionals, and ground teacher education in professional practice, making all ele-
ments of practice (preparing lessons, tasks and materials, carrying out lessons, observing
and reecting on lessons) a central element in the teacher education process. In fact, re-
search focused on teachers training is producing a great deal of information about this
professional knowledge, which includes the following complementary aspects:
Epistemological reection on the meaning of concepts to be taught (e.g., reection
on the dierent meaning of probability). For the particular case of the stochastic
knowledge, Biehler (1990) also suggested that teachers need a historical, philosophical,
cultural and epistemological perspective on this knowledge and its relationships to
other domains of science.
Experience in adapting this knowledge to dierent teaching levels and students various
levels of understanding. This includes, according to Steinbring (1990), organizing and
implementing teaching, experiencing students multiple forms of work and understand-
ing experiments, simulations and graphical representations not just as methodological
teaching aids, but rather as essential means of knowing and understanding.
Critical capacity to analyse textbooks and curricular documents.
Prediction of students learning diculties, errors, obstacles and strategies in problem
solving (e.g., students strategies in comparing two probabilities and students confusion
between the two terms in a conditional probability).
Capacity to develop and analyse assessment tests and instruments and interpret
students responses to the same.
Experience with good examples of teaching situations, didactic tools and materials (e.g.,
challenging and interesting problems, Galton board, simulation, calculators, etc.).
Some signicant issues related to the professional knowledge of teachers are whether teach-
ers are able or not to (i) recognize what concepts can be addressed through a particular
resource or task, and (ii) implement eective learning in the classroom with them. How-
ever, the current preparation of teachers do not always assure this professional knowledge.
For example, research conducted by Chick and Pierce (2008) found the teachers lack of
professional knowledge was evident in their approaches to the lesson-planning task, failing
to bring signicant concepts to the fore, despite all the opportunities that were inherent
in the teaching tasks and resources.
4. Possible Activities to Train Teachers in Probability
It is important to nd suitable and eective ways to teach this mathematical and profes-
sional knowledge to teachers. Since students build their knowledge in an active way, by
solving problems and interacting with their classmates, we should use this same approach
in training the teachers especially if we want that they use a constructivist and social
approach in their teaching; see Jaworski (2001). An important view is that we should give
teachers more responsibility in their own training and help them to develop creative and
critical thinking. That is why we should create suitable conditions for teachers to reect
on their previous beliefs about teaching and discuss these ideas with other colleagues.
One fundamental learning experience that teachers should have to develop their proba-
bility thinking is working with experiments and investigations. To teach inquiry, teachers
need skills often absent in mathematics classrooms: such as ability to cope with ambiguity
and uncertainty; re-balance between teacher guidance and student independence and deep
understanding of disciplinary content. Some other approaches in the training of teachers
include:
Teachers collective analysis and discussion of the students responses,
behavior, strategies, difficulties and misconceptions when solving proba-
bility problems. Groth (2008) suggested that teachers of stochastics must deal with
two layers of uncertainty in their daily work. The rst layer relates to disciplinary
knowledge. Uncertainty is also ubiquitous in teaching because of the unique and
dynamic interactions among teacher, students, and subject matter in any given
classroom. Hence, teachers must understand and navigate the uncertainty inherent
to both stochastics and the classroom simultaneously in order to function eectively.
Case discussion among a group of teachers where they oer and debate conjectures
about general pedagogy, mathematical content, and content-specic pedagogy can help
teachers challenge one anothers claims and interpretations; see Groth (2008).
Planning a lesson to teach students some content using a given instruc-
tional device to develop probability and professional knowledge of
teachers; see Chick and Pierce (2008). Since teachers are asked to teach probability
for understanding, it is essential that they experience the same process as their students.
One way to do this is to have the students play the role of the learner and the teacher
at the same time, going through an actual teacher as learner practice. If they had the
chance to go through such a lesson as a learner and at the same time look at it from
the point a view of a teacher, chances are that they will try it out in their own classrooms.
Project work. New curriculum and methodology guidelines suggest that, when
teachers are involved in research projects, it can change how mathematics is experienced
in the classroom, especially in connection to stochastics. Inquiry is a well accepted
(but not always implemented) process in other school subjects, like science and social
studies, but it is rarely used in a mathematics classroom (where statistics is usually
taught). Moreover, when time available for teaching is scarce a formative cycle where
teachers are rst given a project to work with and then carry out a didactical analysis
of the project can help to simultaneously increase the teachers mathematical and
pedagogical knowledge and at the same time provides the teacher educator with
information regarding the future teachers previous knowledge and learning; Godino et
al. (2008).
Working with technology. We can also capitalize on the ability of some software
to be used as a tool-builder to gain conceptual understanding of probability ideas. Lee
and Hollebrands (2008) introduced a design that used technology both as ampliers
and reorganisers to engage teachers in tasks that simultaneously developed their un-
derstanding of probability with technology and provided teachers with experience rst
hand about how technology tools can be useful in fostering stochastical thinking. In the
experiences of Batanero et al. (2005), simulation helped to train teachers simultaneously
in probability content and its pedagogy, since it helps to improve the teachers prob-
abilistic knowledge, while making them conscious of incorrect intuitions within their
students and themselves.
5. Further Reflections
Teachers need support and adequate training to succeed in achieving an adequate equi-
librium of intuition and rigour when teaching probability. Unfortunately, due to time
pressure, teachers do not always receive a good preparation to teach probability in their
initial training. It is important to convince teacher educators that stochastics is an essen-
tial ingredient in the training of teachers. Moreover, despite the acknowledged fact that
probability is distinct and dierent from other areas of mathematics and the implied need
to provide mathematics teachers with a special preparation to teach this topic, it is pos-
sible to connect the stochastic and mathematical preparation of teachers when time for
training teachers is scarce.
Finally, much more research is still needed to clarify the essential components in the
preparation of teachers to teach probability, identify adequate methods, and establish ap-
propriate levels at which each component should be taught. The signicant research eorts
focusing that have focused on mathematics teacher education and professional development
in the past decade (see Llinares and Krainer, 2006; Ponte and Chapman, 2006; Hill et al.,
2007; Wood, 2008) have not been reected in statistics education. This is an important
research area that can contribute to improve statistics education at school level.
Acknowledgement
This work was supported by the project EDU2010-14947, MICIIN, Madrid & FEDER.
References
Ball, D.L., Lubienski, S.T., Mewborn, D.S., 2001. Research on teaching mathematics: the
unsolved problem of teachers mathematical knowledge. In Richardson, V., (ed.). Hand-
book of Research on Teaching. American Educational Research Association, Washing-
ton, pp. 433456.
Batanero, C., Ca nizares, M.J., Godino, J., 2005. Simulation as a tool to train pre-service
school teachers. Proceedings of the First African Regional Conference of ICMI. ICMI,
Ciudad del Cabo.
Batanero, C., Godino, J.D., Roa, R., 2004. Training teachers to teach probability. Journal
of Statistics Education, 12. http://www.amstat.org/publications/jse.
Batanero, C., Henry, M., Parzysz, B., 2005. The nature of chance and probability. In Jones,
G.A., (ed.). Exploring Probability in School: Challenges for Teaching and Learning.
Springer, New York, pp. 1537.
Begg, A., Edwards, R., 1999. Teachers ideas about teaching statistics. Paper presented at
the Annual Meeting of the Australian Association for Research in Education and the
New Zealand Association for Research in Education. Melbourne, Australia.
Biehler, R., 1990. Changing conceptions of statistics: a problem area for teacher education.
In Hawkins, A., (ed.). Training Teachers to Teach Statistics. Proceedings of the Interna-
tional Statistical Institute Round Table Conference. International Statistical Institute,
Voorburg, pp. 2038.
Borim, C., Coutinho, C., 2008. Reasoning about variation of a univariate distribution:
a study with secondary mathematics teachers. In Batanero, C., Burrill, G., Read-
ing, C., Rossman, A., (eds.). Joint ICMI/IASE Study: Teaching Statistics in School
Mathematics. Challenges for Teaching and Teacher Education. Proceedings of the
ICMI Study 18 and 2008 IASE Round Table Conference. ICMI and IASE, Monterrey.
http://www.ugr.es/
~
icmi/iase_study.
Borovcnik, M., Peard, R., 1996. Probability. In Bishop, A., Clements, K., Keitel, C., Kil-
patrick, J., Laborde, C., (eds.). International Handbook of Mathematics Education.
Kluwer, Dordrecht, pp. 239288.
Canada, D.L., 2008. Conceptions of distribution held by middle school students and pre-
service teachers. In Batanero, C., Burrill, G., Reading, C., Rossman, A., (eds.). Joint
ICMI/IASE Study: Teaching Statistics in School Mathematics. Challenges for Teaching
and Teacher Education. Proceedings of the ICMI Study 18 and 2008 IASE Round Table
Conference. ICMI and IASE, Monterrey. http://www.ugr.es/
~
icmi/iase_study.
Ca nizares, M.J., Ortiz, J., Batanero, C., Serrano, L., 2002. Probabilistic language in Span-
ish textbooks. In Phillips, B., (ed.). ICOTS-6 Papers for School Teachers. International
Association for Statistical Education, Cape Town, pp. 207211.
Chick, H.L., Pierce, R.U., 2008. Teaching statistics at the primary school level: beliefs,
aordances, and pedagogical content knowledge. In Batanero, C., Burrill, G., Read-
ing, C., Rossman, A., (eds.). Joint ICMI/IASE Study: Teaching Statistics in School
Mathematics. Challenges for Teaching and Teacher Education. Proceedings of the
ICMI Study 18 and 2008 IASE Round Table Conference. ICMI and IASE, Monterrey.
http://www.ugr.es/
~
icmi/iase_study.
Eichler, A., 2008. Germany, teachers classroom practice and students learning. In
Batanero, C., Burrill, G., Reading, C., Rossman, A., (eds.). Joint ICMI/IASE Study:
Teaching Statistics in School Mathematics. Challenges for Teaching and Teacher Educa-
tion. Proceedings of the ICMI Study 18 and 2008 IASE Round Table Conference. ICMI
and IASE, Monterrey. http://www.ugr.es/
~
icmi/iase_study.
Fischbein, E., 1975. The Intuitive Source of Probability Thinking in Children. Reidel,
Dordrecht.
Franklin, C., Kader, G., Mewborn, D.S., Moreno, J., Peck, R., Perry, M., Scheaer, R.,
2005. A Curriculum Framework for K-12 Statistics Education. GAISE Report. American
Statistical Association. http://www.amstat.org/education/gaise.
Franklin, C., Mewborn, D., 2006. The statistical education of PreK-12 teachers: a shared
responsibility. In Burrill, G., (ed.). NCTM 2006 Yearbook: Thinking and Reasoning with
Data and Chance. NCTM, Reston, Virginia, pp. 335344.
Gal, I., 2005. Towards probability literacy for all citizens: building blocks and instruc-
tional dilemas. In Jones, G., (ed.). Exploring Probability in Schools: Challenges for
Teaching and Learning. Springer, New York, pp. 3963.
Gareld, J.B., Everson, M., 2009. Preparing teachers of statistics: a graduate course for
future teachers. Journal of Statistics Education, 17. www.amstat.org/publications/
jse.
Godino, J.D., Batanero, C., Roa, R., Wilhelmi, M.R., 2008. Assessing and developing
pedagogical content and statistical knowledge of primary school teachers through project
work. In Batanero, C., Burrill, G., Reading, C., Rossman, A., (eds.). Joint ICMI/IASE
Study: Teaching Statistics in School Mathematics. Challenges for Teaching and Teacher
Education. Proceedings of the ICMI Study 18 and 2008 IASE Round Table Conference.
ICMI and IASE, Monterrey. http://www.ugr.es/
~
icmi/iase_study.
Groth, R.E., 2008. Navigating layers of uncertainty in teaching statistics through case dis-
cussion. In Batanero, C., Burrill, G., Reading, C., Rossman, A., (eds.). Joint ICMI/IASE
Study: Teaching Statistics in School Mathematics. Challenges for Teaching and Teacher
Education. Proceedings of the ICMI Study 18 and 2008 IASE Round Table Conference.
ICMI and IASE, Monterrey. http://www.ugr.es/
~
icmi/iase_study.
Hacking, I., 1975. The Emergence of Probability. Cambridge University Press, Cambridge.
Hill, H.C., Sleep, L., Lewis, J.M., Ball, D., 2007. Assessing teachers mathematical knowl-
edge. In Lester, F., (ed.). Second Handbook of Research on Mathematics Teaching and
Learning. Information Age Publishing, Inc. y NCTM, Greenwich, pp. 111155.
Jaworski, B., 2001. Developing mathematics teaching: teachers, teacher educators and
researchers as co-learners. In Lin, L., Cooney, T.J., (eds.). Making Sense of Mathematics
Teacher Education. Kluwer, Dordrecht, pp. 295320.
Jones, G., 2005. Introduction. In Jones, G., (ed.). Exploring Probability in School: Chal-
lenges for Teaching and Learning. Springer, New York, pp. 112.
Jones, G., Langrall, C., Mooney, E., 2007. Research in probability: responding to classroom
realities. In Lester, F., (ed.). Second Handbook of Research on Mathematics Teaching
and Learning. Information Age Publishing, Inc. y NCTM, Greenwich.
Kvatinsky, T., Even, R., 2002. Framework for teacher knowledge and understanding of
probability. In Phillips, B., (ed.). Proceedings of the Sixth International Conference on
Teaching Statistics. [CD-ROM]. International Statistical Institute, Voorburg, Nether-
lands.
Laplace, P.S., 1985. Theorie Analytique des Probabilites [Analytical Theory of Probabili-
ties]. Jacques Gabay, Paris. Original work published 1814.
Lee, H.S., Hollebrands, K., 2008. Preparing to teach data analysis and probability with
technology. In Batanero, C., Burrill, G., Reading, C., Rossman, A., (eds.). Joint
ICMI/IASE Study: Teaching Statistics in School Mathematics. Challenges for Teaching
and Teacher Education. Proceedings of the ICMI Study 18 and 2008 IASE Round Table
Conference. ICMI and IASE, Monterrey. http://www.ugr.es/
~
icmi/iase_study.
Llinares, S., Krainer, K., 2006. Mathematics student teachers and teacher educators as
learners. In Gutierrez, A., Boero, P., (eds.). Handbook of Research on the Psychology
of Mathematics Education. Sense Publichers, Rotherdam/Taipei, pp. 429-459.
Ma, L.P., 1999. Knowing and Teaching Elementary Mathematics. Lawrence Erlbaum, Mah-
wah.
MEC, 2006a. Real Decreto 1513/2006, de 7 de Diciembre, por el que se establecen las
ense nanzas mnimas de la educacion primaria.
MEC, 2006b. Real Decreto 1631/2006, de 29 de Diciembre, por el que se establecen las
ense nanzas mnimas correspondientes a la educacion secundaria obligatoria.
NCTM, 2000. Principles and standards for school mathematics. NCTM, Reston, Virginia.
Retrieved August 31, 2006. http://standards.nctm.org.
Parzysz, B., 2003. LEnseignement de la statistique et des probabilites en France:

Evolution
au cours dune carriere denseignant periode 19652002. Teaching of statistics and proba-
bility in France. Evolution along a teachers professional work period 1965-2002. In B.
Chaput Coord, Probabilites au Lycee. Commission Inter-Irem Statistique et Proba-
bilites, Paris.
Ponte, J.P., Chapman, O., 2006. Mathematics teachers knowledge and practices. In
Gutierrez, A., Boero, P., (eds.). Handbook of Research on the Psychology of Mathemat-
ics Education: Past, Present and Future. Sense Publishers, Roterdham, pp. 461494.
Scheaer, R.L., 2006. Statistics and mathematics: on making a happy marriage. In Bur-
rill, G., (ed.). NCTM 2006 Yearbook: Thinking and Reasoning with Data and Chance.
NCTM, Reston, Virginia, pp. 309321.
Steinbring, H., 1990. The nature of stochastic knowledge and the traditional mathemat-
ics curriculum - Some experience with in-service training and developing materials. In
Hawkins, A., (ed.). Training Teachers to Teach Statistics. Proceedings of the Interna-
tional Statistical Institute Round Table Conference. International Statistical Institute,
Voorburg, pp. 219.
Stohl, H., 2005. Probability in teacher education and development. In Jones, G., (ed.).
Exploring Probability in Schools: Challenges for Teaching and Learning. Springer, New
York, pp. 345366.
Truran, K.M., 1995. Animism: a view of probability behaviour. In Atweh, B., Flavel, S.,
(eds.). Proceedings of the Eighteenth Annual Conference of the Mathematics Education
Group of Australasia MERGA. MERGA, Darwin, Northern Territory, Australia, pp.
537-541.
Wood, T., 2008. The International Handbook of Mathematics Teacher Education. Sense
Publishers, Rotterdam.
14 Chilean Journal of Statistics
Vol. 3, No. 1, April 2012, 1529
Bayesian Statistics
Research Paper
A Bayesian random eects model for survival
probabilities after acute myocardial infarction
Alessandra Guglielmi
1
, Francesca Ieva
1,
, Anna M. Paganoni
1
and Fabrizio Ruggeri
2
1
Department of Mathematics, Politecnico di Milano, Milano, Italy
2
CNR IMATI, Milano, Italy
(Received: 18 March 2011 Accepted in nal form: 14 July 2011)
Abstract
Studies of variations in health care utilization and outcome involve the analysis of multi-
level clustered data, considering in particular the estimation of a cluster-specic adjusted
response, covariates eect and components of variance. Besides reporting on the extent
of observed variations, those studies quantify the role of contributing factors including
patients and providers characteristics. In addition, they may assess the relationship
between health care process and outcomes. In this article we present a case-study, con-
sidering a Bayesian hierarchical generalized linear model, to analyze MOMI
2
(Month
Monitoring Myocardial Infarction in Milan) data on patients admitted with ST-elevation
myocardial infarction diagnosis; both clinical registries and administrative databanks
were used to predict survival probabilities. The major contributions of the paper consist
in the comparison of the performance of the health care providers, as well as in the
assessment of the role of patients and providers characteristics on survival outcome.
In particular, we obtain posterior estimates of the regression parameters, as well as of
the random eects parameters (the grouping factor is the hospital the patients were
admitted to), through an MCMC algorithm. The choice of covariates is achieved in a
Bayesian fashion as a preliminary step. Some issues about model tting are discussed
through the use of predictive tail probabilities and Bayesian residuals.
Keywords: Bayesian generalized linear mixed models Bayesian hierarchical models
Health services research Logistic regression Multilevel data analysis.
Mathematics Subject Classication: Primary 62F15 Secondary 62P10 62J12.
1. Introduction
Over recent years there has been a growing interest in the use of performance indicators
in health care research, since they may measure some aspects of the health care process,
clinical outcomes or disease incidence. Several examples, available in clinical literature; see,
e.g., Hasday et al. (2002) and Saia et al. (2009), make use of clinical registries to evaluate
performances of medical institutions, helping the health governance to plan activities on
real epidemiological evidence and needs and to evaluate the performances of structures
they manage, providing knowledge about the number of cases, incidence, prevalence and
Corresponding author. Francesca Ieva. Dipartimento di Matematica Politecnico di Milano, Piazza Leonardo da
Vinci 32 I-20133, Milano, Italy. Email: francesca.ieva@mail.polimi.it
16 A. Guglielmi, F. Ieva, A.M. Paganoni and F. Ruggeri
survival concerning a specic disease. As a worthy contribution of this work, both clinical
registry and administrative database were used to model in-hospital survival of acute
myocardial infarction patients, in order to point out benchmarks to be used in provider
proling process.
The disease we are interested in is the ST-segment elevation acute myocardial infarction
(STEMI): it consists of a stenotic plaque detachment, which causes a coronary thrombosis
and a sudden critical reduction of blood ow in coronary vessels. STEMI is characterized
by a great incidence (650 - 700 events per month have been estimated only in Lombardia
region, whose inhabitants are approximately ten millions) and serious mortality (about 8%
in Italy), and in fact it is one of the main causes of death all over the world. A case of STEMI
can be diagnosed through the electrocardiogram (ECG), observing the elevation of ST
segment, and treated by thrombolytic therapy and/or percutaneous transluminal coronary
angioplasty (PTCA), which up to now are the most common procedures. The patients in
our data set always undergo directly to a PTCA procedure avoiding the thrombolysis, even
if the two treatments are not mutually exclusive. Anyway, good results for any of the two
treatments can be evaluated by observing rst the in-hospital survival of inpatients, and
then quantifying the reduction of ST segment elevation one hour after the intervention.
Concerning heart attacks, both survival and quantity of myocardial tissues saved from
damage strongly depend on time saved during the process; in this work, we focus on the
survival outcome. Anyhow, time has indeed a fundamental role in the overall STEMI health
care process. By Symptom Onset to Door time we mean the time since symptoms onset up
to the arrival at Emergency Room (ER), and Door to Balloon time (DB time) is the time
since the arrival at ER up to the surgical practice of PTCA. Clinical literature strongly
stresses the connection between in-hospital survival and procedures time, as attested, e.g.,
in Cannon et al. (2000), Jneid et al. (2008) and MacNamara et al. (2006).
The presence of dierences in the outcomes of health care has been documented ex-
tensively in recent years. In order to design regulatory interventions by institutions for
instance, it is interesting to study the eects of variations in health care utilization on
patients outcomes, in particular examining the relationship between process indicators,
which dene regional or hospital practice patterns, and outcomes measures, such as pa-
tients survival or treatments ecacy. If the analysis of variations concerns in particular
the comparison of the performance of health care providers, it is commonly referred to as
provider proling; see Normand et al. (1997) and Racz and Sedransk (2010). The results of
proling analyses often have far-reaching implications. They are used to generate feedback
for health care providers, to design educational and regulatory interventions by institutions
and government agencies, to design marketing campaigns by hospitals and managed care
organizations, and, ultimately, to select health care providers by individuals and managed
care groups.
The major aim of this work is to measure the magnitude of the variations of health care
providers and to assess the role of contributing factors, including patients and providers
characteristics on survival outcome in STEMI patients. Data on health care utilization
have a natural multilevel structure, usually with patients at the lower level and hospi-
tals forming the upper-level clusters. Within this formulation, two main goals are taken
into account: one is to provide cluster-specic estimates of a particular response, adjusted
for patients characteristics, while the other one is to derive estimates of covariates eects,
such as dierences between patients of dierent gender or between hospitals. Hierarchical
regression modelling from a Bayesian perspective provides a framework that can accom-
plish both these goals. In particular, this article considers a Bayesian generalized linear
mixed model (see Zeger and Karim, 1991) to predict the binary survival outcome by means
of relevant covariates, taking into account overdispersion induced by the grouping factor.
We illustrate the analysis on a subset of data collected in the MOMI
2
survey on patients
admitted with STEMI diagnosis in one of the structures belonging to the Milano Cardio-
logical Network, using a logit model for the survival probability. For this analysis, patients
are grouped by the hospital they have been admitted to for their infarction. Assuming a
Bayesian hierarchical approach for the hospital factors yields modelling dependence among
the random eects parameters, but also using the data set to make inferences on hospitals
which do not have patients in the study, borrowing strength across patients, as well as
clustering the hospitals. A Markov chain Monte Carlo (MCMC) algorithm is necessary to
compute the posterior distributions of parameters and predictive distributions of outcomes,
as well as to use other diagnostic tools, such as Bayesian residuals, for goodness-of-t anal-
ysis. The choice of covariates and link functions was suggested rst in Ieva and Paganoni
(2011), according to frequentist selection procedures and clinical know-how; however, it
was conrmed here using Bayesian tools. We found out that killip rst, that is an index
of the severity of the infarction, and then age, have a sharp negative eect on the survival
probability, while the Symptom Onset to Balloon time has a lighter inuence on it. An
interesting, novel nding is that the resulting variability among hospitals seems not too
large, even if we underlined that four hospitals have a more extreme eect on the survival
(one has a positive eect, while the remaining three have a negative eect) then the others.
Such nding can be explained by the relative homogeneity among the hospitals, all located
in Milano, the region capital. Larger heterogeneity is expected in future when extending
the analysis to all the hospitals in the region. The advantages of a Bayesian approach to
this problem are more than one: providers proling or patients classication are allowed
to be guided not only by statistical but clinical knowledge also, hospitals with low expo-
sure can be automatically included in the analysis, and providers proling can be simply
achieved through the posterior distribution of the hospital-eects parameters.
To the best of our knowledge, this study is the rst example of the use of Bayesian
methods in provider proling using data which arise from the linkage between Italian
administrative databanks and clinical registries. This paper shares the same framework
of hierarchical generalized linear mixed models as in Daniels and Gatsonis (1999), who
examined dierences in the utilization of coronary artery bypass graft surgery for elderly
heart attack patients treated in hospitals.
The paper is organized as follows. Section 2 illustrates the data set about STEMI in
Milano Cardiological Network, while Section 3 describes the main features of the pro-
posed model, with a short discussion on covariates selection. Section 4 and 5 discuss prior
elicitation and Bayesian inferences, respectively. Finally, Section 6 presents results of the
inference on quantities of interest with a discussion. Some nal remarks are reported in
Section 7. All the analyses have been performed with WinBUGS; see Lunn et al. (2000)
and also http://www.mrc-bsu.cam.ac.uk/bugs and R (2009) (version 2.10.1) programs.
2. The STEMI Data Set
A net connecting the territory to hospitals, by a centralized coordination of the emergency
resources, has been activated in the Milano urban area since 2001. The aim of a moni-
toring project on it is the activation of a registry on STEMI to collect process indicators
(Symptom Onset to Door time, rst ECG time, Door to Balloon time and so on), in order
to identify and develop new diagnostic, therapeutic and organizational strategies to be
applied to patients aected by STEMI by Lombardia region, hospitals and 118 organi-
zation (the national toll-free number for medical emergencies). To reach this goal, it is
necessary to understand which organizational aspects can be considered as predictive of
time to treatment reduction. In fact, organizational policies in STEMI health care process
concern both 118 organization and hospitals, since a subject aected by an infarction can
reach the hospital by himself or can be taken to the hospital by 118 rescue units.
So, in order to monitor the Milano Cardiological Network activity, times to treatment
and clinical outcomes, the data collection MOMI
2
was planned and made on STEMI pa-
tients, during six periods corresponding to monthly/bimonthly collections. For these units,
information concerning mode of admission (on his/her own or by three dierent types of
118 rescue units), demographic features (sex, age), clinical appearance (presenting symp-
toms and Killip class at admittance), received therapy (thrombolysis, PTCA), Symptom
Onset to Door time, in-hospital times (rst ECG time, DB time), hospital organization
(for example, admission during on/o hours) and clinical outcome (in-hospital survival)
have been listed and studied. The Killip classication is a system used with acute myocar-
dial infarction patients, in order to stratify them in four risk severity classes. Individuals
with a low Killip class are less likely to die within the rst 30 days after their myocardial
infarction than individuals with a high Killip class. The whole MOMI
2
survey consists of
840 statistical units, but in this work we only focus on patients who underwent primary
PTCA and belonging to the third and fourth collections, since they are of better quality.
Among the resulting PTCA-patients, we selected those who had their own hospital admis-
sion registered also in the Public Health Database of Lombardia region, in order to conrm
the reliability of the information collected in the MOMI
2
registry. Finally, the considered
data set consists of 240 patients.
Previous frequentist analyses on MOMI
2
survey (see Grieco et al., 2008; Ieva, 2008;
Ieva and Paganoni, 2010) pointed out that age, total ischemic time (Symptom Onset to
Balloon time, denoted by OB) in the logarithmic scale and killip of the patient, are the
most signicant factors in order to explain survival probability from a statistical and
clinical point of view. Here killip is a binary variable, corresponding to 0 for less severe
(Killip class equal to 1 or 2) and 1 for more severe (Killip class equal to 3 or 4) infarction.
This choice of covariates was conrmed using Bayesian variable selection procedure; see
the next section for more details.
The main goal of our study is to explain and predict, by means of a Bayesian random
eects model, the in-hospital survival (i.e., the proportion of patients discharged alive
from the hospital). The data set consists of n = 240 patients who were admitted to J = 17
hospitals after a STEMI event. The number of STEMI patients per hospital ranges from 1
to 32, with a mean of 14.12. Each observation y
i
records if a patient survived after STEMI,
i.e., y
i
= 1 if the ith patient survived, y
i
= 0 otherwise. In the rest of the paper, y denotes
the vector of all responses (y
1
, . . . , y
n
). The data set is strongly unbalanced, since 95%
of the patients have been discharged alive. The observed hospital-survival rates ranges
from 75% to 100%. These high values are explained because they are in-hospital survival
probabilities, a follow-up data being not available yet. The data set contained some missing
covariates, with proportions of 7%, 24% and 2% for age, OB and killip respectively. The
missing data for age and OB were imputed as the empirical means (64 years for age,
553 minutes for OB), while we sampled the missing 0-1 killip class covariates from the
Bernoulli distribution with probability of success estimated from the non-missing data.
After having imputed all the covariates, the mean value of age and OB did not change,
while the proportion of patients with less severe infarction (killip = 0) was 94%. Finally,
we had no missing data concerning hospital of admission and outcome.
3. A Bayesian Generalized Mixed-Effects Model
We considered a generalized mixed-eects model for binary data from a Bayesian view-
point. For a recent review on this topic, see Chapters 13 in Dey et al. (2000). For each
patient (i = 1, . . . , n), let Y
i
be a Bernoulli random variable with mean p
i
, which represents
the probability that the ith patient survived after STEMI. The p
i
s are modelled through
a logit regression with covariates x := {x
i
}, x
i
:= (1, x
i1
, x
i2
, x
i3
) which represent the age,
the Symptom Onset to Balloon time in the log scale (log-OB) and the killip, respectively,
of the ith patient in the data set. Moreover, age and log-OB have been centered. Since the
patients come from J dierent hospitals, we assume the following multilevel model, with
the hospital as a random eect:
Y
i
|p
i
ind
Be(p
i
), i = 1, . . . , n, (1)
and
logit(p
i
) = log
p
i
1 p
i
=
0
+
1
x
i1
+
2
x
i2
+
3
x
i3
+ b
k[i]
, (2)
where b
k[i]
represents the hospital eect of the ith patient in hospital k[i]. We denote by
the vector of regression parameters (
0
,
1
,
2
,
3
). It is well-known that Equations (1)
and (2) have a latent variable representation (see Albert and Chib, 1993), which can be
very useful in performing Bayesian inference, as well as in providing medical signicance:
conditioning on the latent variables Z
1
, . . . , Z
n
, the Y
1
, . . . , Y
n
are independent, and, for
i = 1, . . . , n,
Y
i
=
1, if Z
i
0;
0, if Z
i
< 0;
(3)
where
Z
i
= x
i
+ b
k[i]
+
i
,
i
i.i.d.
f
, (4)
being f
(t) = e
t
(1 + e
t
)
2
the standard logistic density function. The same class of
models, however, without considering random eects, was applied in Souza and Migon
(2004) to a similar data set of patients after acute myocardial infarction.
As mentioned in Section 2, the choice of covariates was rst suggested in Ieva and
Paganoni (2011), using frequentist model choice tools. However, we have considered it also
in a Bayesian framework, using the Gibbs variable selection method by Dellaportas et al.
(2002). But rst, as a default analysis, we considered covariates selection via the R package
BMA; see Raftery et al. (2009). A subgroup of 197 patients with 11 non-missing covariates
was processed by the function bic.glm, and 7 covariates were selected (age, OB time, killip,
sex, admission during on/o hours, ECG time, number of previous hospitalizations). For
this choice of covariates, the non-missing data extracted from the 240-patients data set
consists of 217 units, which were again analyzed via bic.glm . The posterior probability
that each variable is non-zero was very high (about 40%) for age and killip, while they
were smaller than 7% for the others. Moreover, the smallest BICs denoting the best
models resulted for those including age, killip and sex. Since sex is strongly correlated
with age in our data set (only elderly women are in), at the end, we agreed with the choice
of covariates in Ieva and Paganoni (2011), considering only age and killip, while the OB
time was strongly recommended by clinical and health care utilization know-how, since it
was the main process indicator of the MOMI
2
clinical survey.
As a second analysis, we consider only covariates which have non-missing values for all
patients (age, OB time, killip, sex, admission during on/o hours, number of previous
hospitalizations), to be analyzed using the Gibbs variable selection method. The linear
predictor assumed in the right hand-side of Equation (2) to select covariates can be rep-
resented as
i
=
0
+
6
j=1
j
x
ij
, i = 1, . . . , n, (5)
where (
1
, . . . ,
6
) is a vector of parameters in {0, 1}. Of course, a prior for both the
regression parameter and the model index parameter must be elicited, so that the
marginal posterior probability of suggest which variables must be included in the model.
We assumed dierent noninformative priors for the logit model with the linear predictor
given in Equation (5), as suggested in Ntzoufras (2002), implementing a simple BUGS
code to compute the marginal posterior distributions for each
j
, for j = 1, . . . , 6, and the
posterior inclusion probabilities. However the analysis conrmed the previously selected
model.
The selection of such a few number of covariates (with respect to 13, the total number)
is not surprising since previous analyses; see Ieva (2008) and Ieva and Paganoni (2010)
pointed out that the covariates are highly correlated. For instance, there is dependency
between age on one hand and sex, or symptoms, or mode of admission, on the other,
between symptoms and killip, or symptoms and mode of admission, and between sex
and symptoms. These relationships can be explained because acute coronary syndromes,
as STEMI, aect mainly male patients instead of females, and are more frequent as the
patient age increases. Moreover, it is well-known that the STEMI symptoms depend on the
severity of the infarction itself, and elderly patients have usually more atypical symptoms.
Furthermore, the symptoms may inuence the choice of the type of ambulance sent to
rescue the patient; ambulances which allow the ECG teletransmission are usually sent to
patients presenting more typical infarction symptoms, in order to allow them to skip the
waiting time due to ER procedures, and to reduce accordingly the door to balloon time.
4. The Prior Distribution
As mentioned in the previous sections, one of the aim of this paper is to make a compar-
ison among the patients survival probabilities treated in dierent hospitals of the Milano
Cardiological Network. Such an aim can be accomplished if, for instance, we assume the
hospital each patient was admitted to as a random factor. We make the usual (from a
Bayesian viewpoint) random eects assumption for the hospitals, that is, the hospital ef-
fect parameters b
j
s are drawn from a common distribution; moreover, since no information
is available at the moment to distinguish among the hospitals, we assume symmetry among
the hospital parameters themselves, i.e., b
1
, . . . , b
J
can be considered as (the rst part of
an innite sequence of) exchangeable random variables. Via Bayesian hierarchical models,
not only we model dependence among the random eects parameters b := (b
1
, . . . , b
J
), but
it be also possible to use the data set to make inferences on hospitals which have few or
no patients in the study, borrowing strength across hospitals. As usual in the hierarchical
Bayesian approach, the regression parameter and the hospital parameter b are assumed
a priori independent, is given a (multivariate) Gaussian distribution and b is given a
scale-mixture of (multivariate) Gaussian distributions; more specically:
b, MN(
, V
),
b
1
, . . . , b
J
|
i.i.d.
N(
b
,
2
), and U(0,
0
).
(6)
Observe that the prior assumption on b is that, conditionally on the parameter , each
hospital eect parameter has a Gaussian distribution with variance
2
; here the uniform
prior on is set as an assumption of ignorance/symmetry on the standard deviation of
each hospital eect. The Gaussian prior for is standard, but its hyperparameters, as
well as the hyperparameter of the prior distribution for , it is given informatively, using
available information from other MOMI
2
collections; for more details, see Section 6.2. On
the other hand, a more standard prior for b
j
would be a scale-mixture of normals, mixed
by an inverse-gamma distribution for
2
, with parameter (, ) for small . However, this
prior has been often criticized (see Gelman, 2006), mainly because the inferences do not
result robust with respect to the choice of , and the prior density (for all small ), as well
as the resulting posterior, are too peculiar. In what follows, the parameter vector (, b, )
is denoted by .
5. Bayesian Inference
Based on given priors and likelihood, the posterior distribution of is expressed by
(|y, x) () L(y|, z, x)f(z)
= ()(b|)()
n
i=1
(I
(0,+)
(z
i
))
yi
(I
(,0]
(z
i
))
1yi
n
i=1
f
(z
i
x
i
b
k[i]
).
(7)
We are interested in predictions too. This implies (i) considering the posterior predictive
survival probability of a new patient coming from an hospital already included in the
study, or (ii) the posterior predictive survival probability of a new patient coming from a
new (J + 1)th hospital. We have
P(Y
n+1
= 1|y, x, b
j
) =
R
4
P(Y
n+1
= 1|, b
j
, x)(|b
j
, y)d, j = 1, . . . , J, (8)
for a new patient with covariate vector x coming from the jth hospital in the study, and
P(Y
n+1
= 1|y, x, b
J+1
) =
R
4
P(Y
n+1
= 1|, b
J+1
, x)(|b
J+1
, y) d, (9)
where (|b
J+1
, y) is computed from
(, b
J+1
|y) =
R
+
(b
J+1
|)(, |y)d ,
being (b
J+1
|) the prior population conditional distribution given in Equation (6).
As far as model checking is concerned, we consider predictive distributions for patients
already enrolled in the study in the spirit of replicated data in Gelman et al. (2004). More
specically, we compute
P(Y
new
i
= 1|y, x
i
, b
k[i]
), for all i = 1, . . . , n. (10)
Here, Y
new
i
denotes the ith replicated data that could have been observed, or, to think
predictively, as the data that we would see tomorrow if the experiment that produced
y
i
today were replicated with the same model and the same value of parameters that
produced the observed data; see Gelman et al. (2004, Section 6.3). Since we have a very
unbalanced data set, the following Bayesian rule is adopted: a patient is classied as alive
if P(Y
new
i
= 1|y, x
i
, b
k[i]
) = E[Y
new
i
|y, x
i
, b
k[i]
] is greater than the empirical mean y
n
. This
rule is equivalent to minimize the expected value of the following loss function
L(P(Y
i
= 1|y, x
i
, b
k[i]
), a
1
) = Max{0, y
n
P(Y
i
= 1|y, x
i
, b
k[i]
)},
L(P(Y
i
= 1|y, x
i
, b
k[i]
), a
0
) = Max{0, P(Y
i
= 1|y, x
i
, b
k[i]
) y
n
},
where the action a
1
is to classify the patient as alive and the action a
0
corresponds to
classify the patient as dead. Then the coherence between the Bayesian rule and the data
set is checked.
Finally we computed the latent Bayesian residuals for binary data as suggested in Albert
and Chib (1995). Thanks to the latent variable representation in Equations (3) and (4) of
the model, we can consider the realized errors
e
i
= Z
i
(x
i
+ b
k[i]
), i = 1, . . . , n, (11)
obtained solving Equation (4) w.r.t.
i
. Each e
i
is a function of the unknown parameters,
so that its posterior distribution can be computed through the MCMC simulated values,
and later examined for indications of possible departures from the assumed model and the
presence of outliers; see also Chaloner and Brant (1998). Therefore, it is sensible to plot
credibility intervals for the marginal posterior of each e
i
, comparing them to the marginal
prior credibility intervals (of the same level).
6. Data Analysis
In this section we illustrate the Bayesian analysis of the data set described in Section 2,
giving some details on computations and prior elicitation.
6.1 Bayesian computations
As we mentioned in Section 1, all estimates were derived using WinBUGS. The compu-
tation of the full conditionals to directly implement a Gibbs sampler algorithm can be
computed starting from Equation (7); however they are not standard distributions, i.e.,
closed form expressions do not exist for all of them, given the priors in Equation (6). Some
details on the full conditionals for general design GLMMs required by WinBUGS are in
Zhao et al. (2006).
The rst 100,000 iterations of the chain were discarded, retaining parameter values each
80 iterations to decrease autocorrelations, with a nal sample size equal to 5,000; we
run the chains much longer (for a nal sample size of 10,000 iterations), but the gain
in the MC errors was relatively small. Some convergence diagnostics (Gewekes and the
two Heidelberger-Welch ones) were checked; see, e.g., the reference manual of the CODA
package (Plummer et al., 2006) for more details. Moreover, we monitored traceplots, au-
tocorrelations and MC error/posterior standard deviation ratios for all the parameters,
indicating the MCMC algorithm converged. Code is available from the authors upon re-
quest.
6.2 Informative prior hyperparameters
Concerning information about hyperprior parameters, we xed
b
= 0 regardless of any in-
formation, since, by the exchangeability assumption, the dierent hospitals have the same
prior mean (xed equal to 0 to avoid confounding with
0
). As far as is concerned, we
have enough past data to be relatively informative in eliciting prior hyperparameters; they
were xed after having tted model given in Equations (1) and (2), under non-informative
priors for , to similar data, i.e., 359 patients undergone primary PTCA whose data were
collected during the other four MOMI
2
collections. Therefore, for the present analysis, we
xed
= (3, 0, 0.1, 0.7)
, which are the posterior means of the regression parameters

under the preliminary analysis. The matrix V
was assumed diagonal, V
= diag(2, 0.04,
0.5882, 3.3333), which, except for the second value, are about 10 times the posterior vari-
ances of the regression parameters under the preliminary analysis (0.04 is 100 times the
posterior variance, in order to consider a vaguer prior for
1
). The prior hyperparameter
0
was xed equal to 10, a value compatible with the support of the posterior distribu-
tion for in the preliminary analysis. Posterior estimates of , b and proved to be
robust with respect to
and V , even when we xed a non-diagonal matrix for V , as-

suming prior dependence through the regression parameters (the non-diagonal V elicited
via the preliminary analysis as well). As far as the variances of the s parameters are
concerned, the robustness analysis pointed out that assuming smaller values than those
reported here yielded a too informative prior, that is the data did not swamp the prior;
on the other hand, larger variances produced typical computational diculties of a too
vague prior. This choice of the variances values represents an optimal trade-o between
these two behaviors.
6.3 Results
Summary inferences about regression parameters and can be found in Table 1, while the
marginal posterior distributions are depicted in Figures 1 and 2.
Table 1. Posterior means, standard deviations, and 95% credibility intervals of the regression pa-
rameters and .
Informative prior Credibility intervals
mean sd lwr upr
intercept
0
3.8160 0.5704 2.8310 5.1100
age
1
-0.0792 0.0324 -0.1464 -0.0183
log(OB)
2
-0.1527 0.3326 -0.7902 0.5154
killip
3
-1.5090 0.8159 -3.0470 0.1340
random eect std. dev. 1.1770 0.7417 0.0766 2.8960
From Table 1 and Figure 1 it is clear that the marginal posteriors of
1
and
3
are
concentrated on the negative numbers, conrming the nave interpretation that an increase
in age or killip class decreases the survival probability. The negative eect of the log(OB) is
questionable, given its high variability, even if the posterior median of
2
is 0.16. Anyway,
it was indeed included because of its clinical relevance; moreover, it is the main process
indicator in health care monitoring of STEMI procedures. Observe that the posterior
mean of
0
+b
j
, which is the logit of the survival probability for a patient with average
covariates from any hospital, is between 2.90 and 4.78, yielding a high posterior estimates
of the survival probability from any hospital, as expected.
By inspecting Figure 2 a shrinkage of the posterior density of with respect to the
uniform prior can be observed; this fact supports the conjecture of a low variability within
medical institutions, which can be partly explained by the relative homogeneity among
the hospitals, all located in Milano. As far as the marginal posterior distribution of the
random eect parameters are concerned, Figure 3 displays the posterior median and mean
(with 95% credibility intervals) of each hospital parameter b
j
, for j = 1, . . . , J.
2 3 4 5 6 7
0
.
0
0
.
2
0
.
4
0
.
6
0
d
e
n
s
i
t
y
0.20 0.15 0.10 0.05 0.00 0.05
0
2
4
6
8
1
0
1
2
1
d
e
n
s
i
t
y
1.5 1.0 0.5 0.0 0.5 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
1
.
2
2
d
e
n
s
i
t
y
4 2 0 2
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
3
d
e
n
s
i
t
y
Figure 1. Marginal posterior density of the regression coecients.
0 2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
1
.
2
1
.
4
d
e
n
s
i
t
y
Figure 2. Marginal posterior density of .
In Table 2, we report
p
j
= min{P(b
j
> 0|y), P(b
j
< 0|y)}, j = 1, . . . , J,
together with the signum of the posterior median of the b
j
s. Low values of p
j
denote the
posterior distribution of b
j
is far from 0, so that the jth hospital signicantly contributes to
the (estimated) regression intercept
0
+b
j
. In Figure 3, the credible intervals corresponding
to p
j
less than 0.18 are depicted in yellow; it is clear that hospital 9 has a positive eect,
while hospital 10, 11 and 15 have a negative eect on the survival probability.
0 5 10 15
2
0
2
4
hospitals
b
j
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
Figure 3. Posterior median (bullet), mean (square) and 95% credibility intervals of all random eect
parameters b
j
. The credible intervals for hospitals such that min(P(b
j
> 0|y), P(b
j
< 0|y)) < 0.18
are dashed.
Table 2. Values of p
j
and the signum of the posterior median of each hospital parameters.
b
1
b
2
b
3
b
4
b
5
b
6
b
7
b
8
b
9
0.27 0.40 0.32 0.25 0.44 0.41 0.49 0.49 0.18
+ + + + - + + + +
b
10
b
11
b
12
b
13
b
14
b
15
b
16
b
17
0.17 0.12 0.28 0.28 0.44 0.17 0.26 0.29
- + - - + - - +
Observe that all the credible intervals of the random eect parameters in Figure 3
include 0, so that we might wonder if the random intercept should be discarded from the
model. However, Mauri (2011) presents a Bayesian selection analysis of the same data set
considered here, concluding that the posterior inclusion probability of the random eect is
signicantly larger than 0 (between 0.2 and 0.6 under dierent reasonable priors). Similar
ndings were drawn in Ieva and Paganoni (2010) from a frequentist perspective.
Figure 4 displays medians and 95% credibility intervals for the posterior predictive sur-
vival probabilities give in Equation (8) of four benchmark patients:
(a) x
1
= 0, x
2
= 0, x
3
= 0, i.e., a patient with average age (64 years), average OB (553
min.) and less severe infarction (Killip class 1 or 2);
(b) x
1
= 0, x
2
= 0, x
3
= 1, i.e., a patient with same age and OB as (a), but with severe
infarction (Killip class 3 or 4);
(c) x
1
= 16, x
2
= 0, x
3
= 0, i.e., an elder patient (80 years), with average OB (553 min.)
and less severe infarction;
(d) x
1
= 16, x
2
= 0, x
3
= 1, i.e., an elder patient with average OB and severe infarction,
coming from an hospital already in the study. The last credibility interval (in red in each
panel) corresponds to the posterior predictive survival probability give in Equation (9) of
a benchmark patient coming from a new random (J + 1)th hospital. Moreover, from the
gure it is clear that killip has a stronger (on average) inuence on survival than age since,
moving from left to right panels (same age, killip increased) the credibility intervals get
much wider than moving from the top to the bottom panels (same killip, age increased).
Finally, as far as predictive model checking is concerned, we computed the predictive
probabilities in Equation (10); the classication rule described in Section 5 gives an error
rate equal to 27% (64 patients were erroneously classied as dead and only 1 patient was
0 5 10 15
0
.
0
0
.
4
0
.
8
Patient (a)
hospitals
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 5 10 15
0
.
0
0
.
4
0
.
8
Patient (b)
hospitals
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 5 10 15
0
.
0
0
.
4
0
.
8
Patient (c)
hospitals
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 5 10 15
0
.
0
0
.
4
0
.
8
Patient (d)
hospitals
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
Figure 4. Posterior median (bullet), mean (square) and 95% credible intervals of the posterior
predictive survival probabilities for 4 benchmark patients from each hospital in the study and from
a new random hospital (the 18th dashed credible interval).
erroneously classied as alive). As a measure of goodness of t we also computed the Brier
score, the average squared deviation between predicted probabilities and outcomes, which
is equal to 0.04, showing a fairly good predictive t of our model.
The left panel of Figure 5 displays the posterior distributions of the Bayesian residuals,
as in Equation (11), for each observations, where the red line in the plot denotes the
prior marginal distribution (logistic). On the other hand, the right panel shows the same
posterior distributions in a 3-dimensional perspective, each residual posterior referring to
the posterior survival probability of the corresponding patient.
The picture shows that there are no outliers among the patients who survived, since
their posterior residual densities and the prior residual one share the same cluster. More
variability appears among the dead patients as far as posterior location and dispersion
are concerned. This feature could be brought about by the disparity in the number of
cases among the dead and the alive in our data set. Moreover, most deaths occur in the
class of more severe infarction, and concern elder people. This rationale explains the larger
credibility intervals in Figure 4(d) (right bottom panel) as well, which in fact refers to
elderly patients with severe infarction.
10 5 0 5
0
.
0
0
.
1
0
.
2
0
.
3
residuals
d
e
n
s
it
y
Figure 5. Left panel: posterior distributions of the latent Bayesian residuals. The dashed and solid
lines correspond to observations y
i
= 0 (dead) and y
i
= 1 (alive), respectively. The solid gray
line is the marginal prior distribution (logistic). Right panel: posterior distributions of the latent
Bayesian residuals against the expected posterior survival probabilities.
7. Conclusions
In this work we have considered a Bayesian hierarchical generalized linear model with ran-
dom eects for the analysis of clinical and administrative data with a multilevel structure.
These data arise from MOMI
2
clinical registry, based on a survey on patients admitted with
ST-elevation myocardial infarction diagnosis, integrated with administrative databanks.
The analysis carried out on them could provide a decisional support to the cardiovascular
health care governance. We adopted a Bayesian point of view to tackle the problem of
modelling survival outcomes by means of relevant covariates, taking into account overdis-
persion induced by the grouping factor, i.e., the hospital where each patient has been
admitted to. To the best of our knowledge, this study is the rst example of a Bayesian
analysis of data arising from the linkage between Italian administrative databanks and
clinical registries. The main aim of this paper was to study the eects of variations in
health care utilization on patient outcomes, since the adopted model points out relation-
ships between process and outcome measures. We also provided cluster-specic estimates
of survival probabilities, adjusted for patients characteristics, and derived estimates of
covariates eects, using MCMC simulation of posterior distributions of the parameters;
moreover we discussed model selection and goodness of t. We found out that Killip rst,
and age, have a sharp negative eect on the survival probability, while the OB (onset to
balloon) time has a lighter inuence on it. The resulting variability among hospitals seems
not too large, even if we underlined that 4 hospitals have a more extreme eect on the
survival: in particular hospital 9 had a positive eect, while hospitals 10, 11 and 15 had a
negative eect. As far as negative features of the MCMC outputs are concerned, we found
that the marginal posterior distributions of (
0
, b
j
), for each j, are concentrated on lines
of the whole parameter space, due to the confounding between the intercept parame-
ter and the random eects parameters. However the mixing and the convergence of the
chain, under a suitable thinning, were completely satisfactory. Finally, as a further step in
the analysis, we are considering Bayesian nonparametrics to model the hospital eects, in
order to take advantage of the in-built clustering they provide.
References
Albert, J.H., Chib, S., 1993. Bayesian analysis of binary and polychotomous response data.
Journal of the American Statistical Association, 88, 669679.
Albert, J.H., Chib S., 1995. Bayesian residual analysis for binary response regression mod-
els. Biometrika, 82, 747759.
Cannon, C.P, Gibson, C.M., Lambrew, C.T., Shoultz, D.A., Levy, D., French, W.J., Gore,
J.M., Weaver, W.D., Rogers, W.J., Tiefenbrunn, A.J., 2000. Relationship of symptom-
onset-to-balloon time and door-to-balloon time with mortality in patients undergoing
angioplasty for acute myocardial infarction. Journal of American Medical Association,
283, 29412947.
Chaloner, K., Brant, R., 1988. A Bayesian approach to outlier detection and residual
analysis. Biometrika, 31, 651659.
Daniels, M.J., Gatsonis, C., 1999. Hierarchical generalized linear models in the analysis of
variation in health care utilization. Journal of the American Statistical Association, 94,
2942.
Dellaportas, P., Forster, J.J., Ntzoufras, I., 2002. On Bayesian model and variable selection
using MCMC. Statistics and Computing, 12, 2736.
Dey, D.K., Ghosh, S.K., Mallick, B.M., (eds.) 2000. Generalized Linear Models: A Bayesian
Perspective. Chapman & Hall/CRC, Biostatistics Series, New York.
Gelman, A., 2006. Prior distributions for variance parameters in hierarchical models (Com-
ment on article by Browne and Draper). Bayesian Analysis, 3, 515534.
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2004. Bayesian Data Analysis. Second
edition. Chapman & Hall/CRC, Boca Raton, Florida.
Grieco, N., Corrada, E., Sesana, G., Fontana, G., Lombardi, F., Ieva, F., Marzegalli, M.,
Paganoni, A.M., 2008. Predictors of reduction of treatment time for ST-segment el-
evation myocardial infarction in a complex urban reality. The MoMi2 survey. MOX
Report n. 10/2008. Dipartimento di Matematica, Politecnico di Milano. Available at
http://mox.polimi.it/it/progetti/pubblicazioni/quaderni/10-2008.pdf.
Hasday, D., Behar, S., Wallentin, L., Danchin, N., Gitt, A.K., Boersma, E., Fioretti, P.M.,
Simoons, M.L., Battler, A., 2002. A prospective survey of the characteristics, treatments
and outcomes of patients with acute coronary syndromes in Europe and the mediter-
ranean basin. The euro heart survey of acute coronary syndromes. European Heart
Journal, 23, 11901210.
Ieva, F., 2008. Modelli statistici per lo studio dei tempi di intervento nellinfarto miocardico
acuto. Master Thesis. Dipartimento di Matematica, Politecnico di Milano. Available at
http://mox.polimi.it/it/progetti/pubblicazioni/tesi/ieva.pdf.
Ieva, F., Paganoni, A.M., 2010. Multilevel models for clinical registers concerning STEMI
patients in a complex urban reality: a statistical analysis of MOMI
2
survey. Communi-
cations in Applied and Industrial Mathematics, 1, 128147.
Ieva, F., Paganoni, A.M., 2011. Process indicators for assessing quality of hospitals care:
a case study on STEMI patients. JP Journal of Biostatistics, 6, 5375.
Jneid, H., Fonarow, G., Cannon, C., Palacios, I., Kilic, T., Moukarbel, G.V., Maree, A.O.,
Liang, L., Newby, L.K., Fletcher, G., Wexler, L., Peterson, E., 2008. Impact of time of
presentation on the care and outcomes of acute myocardial infarction. Circulation, 117,
25022509.
Lunn, D.J., Thomas, A., Best, N., Spiegelhalter, D., 2000. WinBUGS - a Bayesian mod-
elling framework: concepts, structure, and extensibility. Statistics and Computing, 10,
325337.
MacNamara, R.L., Wang, Y., Herrin, J., Curtis, J.P., Bradley, E.H., Magid, D.J., Peterson,
E.D., Blaney, M., Frederick, P.D., Krumholz, H.M., 2006. Eect of door-to-balloon time
on mortality in patients with ST-segment elevation myocardial infarction. Journal of
American College of Cardiology, 47, 21802186.
Mauri, F., 2011. Bayesian variable selection for logit models with random intercept: appli-
cation to STEMI data set. Master Thesis. Dipartimento di Matematica, Politecnico di
Milano.
Normand, S.T., Glickman, M.E., Gatsonis, C.A., 1997. Statistical methods for proling
providers of medical care: issues and applications. Journal of the American Statistical
Association, 92, 803814.
Ntzoufras, I., 2002. Gibbs variable selection using BUGS. Journal of Statistical Software,
7. Available at http://www.jstatsoft.org/.
Plummer, M., Best, N., Cowles, K., Vines, K., 2006. CODA: Convergence diagnosis and
output analysis for MCMC. R News, 6, 711.
R Development Core Team, 2009. R: A Language and Environment for Statistical Com-
puting. R Foundation for Statistical Computing, Vienna, Austria. Available at http:
//www.R-project.org.
Racz, J., Sedransk, J., 2010. Bayesian and frequentist methods for provider proling us-
ing risk-adjusted assessments of medical outcomes. Journal of the American Statistical
Raftery, A., Hoeting, J., Volinsky, C., Painter, I., Yeung, K.Y., 2009. BMA: Bayesian model
averaging. Available at http://CRAN.R-project.org/package=BMA.
Saia, F., Piovaccari, G., Manari, G., Guastaroba, P., Vignali, L., Varani, E., Santarelli, A.,
Benassi, A., Liso, A., Campo, G., Tondi, S., Tarantino, F., De Palma, R., Marzocchi,
A., 2009. Patient selection to enhance the long-term benet of rst generation drug-
eluting stents for coronary revascularization procedures: insights from a large multicenter
registry. Eurointervention, 5, 5766.
Souza, A.D.P., Migon, H.S., 2004. Bayesian binary regression model: an application to
in-hospital death after AMI prediction. Pesquisa Operacional, 24, 253267.
Zeger, S.L., Karim, M.R., 1991. Generalized linear models with random eects: a Gibbs
sampling approach. Journal of the American Statistical Association, 86, 7986.
Zhao, Y., Staudenmayer, J., Coull, B.A., Wand, M.P., 2006. General design Bayesian
generalized linear mixed models. Statistical Science, 21, 3551.
Vol. 3, No. 1, April 2012, 3142
Nonparametric Statistics
Research Paper
On the wavelet estimation of a function in a
density model with non-identically distributed
observations
Christophe Chesneau
1,
and Nargess Hosseinioun
2
1
LMNO, Universite de Caen Basse-Normandie, Caen, France
2
Department of Statistics, Payame Noor University, Iran
(Received: 10 March 2011 Accepted in nal form: 30 May 2011)
Abstract
A density model with possible non-identically distributed random variables is consid-
ered. We aim to estimate a common function appearing in the densities. We construct a
new linear wavelet estimator and study its performance for independent and dependent
data (the -mixing case is explored). Then, in the independent case, we develop a new
adaptive hard thresholding wavelet estimator and prove that it attains a sharp rate of
convergence.
Keywords: Biased observations Dependent data Rates of convergence Wavelet
basis.
Mathematics Subject Classication: Primary 62G07 Secondary 62G20.
1. Introduction
We consider the following density model. Let (X
i
)
iZ
be a random process such that, for
any i Z, the density of X
i
is
g
i
(x) = w
i
(x)f(x), x R, (1)
where (w
i
(x))
iZ
is a known sequence of positive functions and f is an unknown positive
function. Let L > 0 and X
i
() = {x R; g
i
(x) = 0}. We suppose that X
i
() does not
depend on i, X
1
() [L, L], there exists a constant C
> 0 such that

sup
xR
f(x) C
, (2)
Corresponding author. Christophe Chesneau. Department of Mathematics, LMNO, University of Caen, UFR de
Sciences, F-14032, Caen, France. Email: christophe.chesneau@gmail.com
32 C. Chesneau and N. Hosseinioun
and there exists a sequence of real positive numbers (v
i
)
iZ
(which can depend on n) such
that
inf
xX1()
w
i
(x) v
i
. (3)
The goal is to estimate f globally when only n random variables X
1
, . . . , X
n
of (X
i
)
iZ
are observed. Such an estimation problem has been recently investigated by Aubin and
Leoni-Aubin (2008a,b). It can be viewed as a generalization of the standard biased density
model; see e.g., Patil and Rao (1977), El Barmi and Simono (2000), Brunel et al. (2009)
and Ramirez and Vidakovic (2010).
In this article, we investigate the estimation of f via the powerful tool of the wavelet
analysis. Wavelets are attractive for nonparametric density estimation because of their
spatial adaptivity, computational eciency and asymptotic optimality properties. They
enjoy excellent mean integrated squared error (MISE) properties and can achieve fast rates
of convergence over a wide range of function classes (including spatially inhomogeneous
function). Details on wavelet analysis in nonparametric function estimation can be found
in Antoniadis (1997) and Hardle et al. (1998).
In the rst part of this study, we develop a new linear wavelet estimator. We determine a
sharp upper bound for the associated MISE for independent (X
i
)
iZ
. Then, we extend this
result for possible dependent (X
i
)
iZ
following the -mixing case. In particular, we prove
the upper bound obtained in the independent case is not deteriorated by our dependence
condition as soon as the -mixing coecients (
m
)
mN
of (X
i
)
iZ
(dened in Section 3)
satisfy

n
m=1

m
C, where C > 0 denotes a constant independent of n. The second
part of the study is devoted to the adaptive estimation of f for independent (X
i
)
iZ
. We
construct a new hard thresholding wavelet estimator and prove that it attains a sharp
upper bound, close to the one attained by the corresponding linear wavelet estimator. Let
us mention that our results are proved under very mild assumptions on w
1
(x), . . . , w
n
(x).
Section 2 presents wavelets and the Besov balls. The linear wavelet estimation is devel-
oped in Section 3. Section 4 is devoted to our hard thresholding wavelet estimator. The
proofs are postponed to Section 5.
2. Wavelets and Besov Balls
Let L > 0, N be a positive integer, and and be the Daubechies wavelets db2N (which
satisfy supp() = supp() = [1 N, N]). Set
j,k
(x) = 2
j/2
(2
j
x k),
j,k
(x) = 2
j/2
(2
j
x k),
and
j
= {k Z; 1N 2
j
xk N, x [L, L]} = {k Z; L2
j
+N1 k L2
j
N}.
Then, there exists an integer such that, for any integer , the collection
B = {
,k
(.), k
;
j,k
(.); j N {0, . . . , 1}, k
j
}
is an orthonormal basis of L
2
([L, L]) = {h : [L, L] R;
_
L
L
h
2
(x)dx < }. For more
details about wavelet basis, see Meyer (1992) and Cohen et al. (1993).
For any integer , any h L
2
([L, L]) can be expanded on B as
h(x) =
,k
,k
(x) +
j=
k
j
j,k
j,k
(x),
where
j,k
and
j,k
are the wavelet coecients of h dened by
j,k
=
_
L
L
h(x)
j,k
(x)dx,
j,k
=
_
L
L
h(x)
j,k
(x)dx. (4)
Let M > 0, s > 0, p 1, r 1 and L
p
([L, L]) = {h : [L, L] R;
_
L
L
|h(x)|
p
dx < }.
Set, for every measurable function h on [L, L] and 0,
(h)(x) = h(x + ) h(x),
(h)(x) =
h)(x) and identically, for N N
,
N
(h)(x). Let
N
(t, h, p) = sup
[t,t]
__
L
L
|
N
(h)(u)|
p
du
_
1/p
.
Then, for s (0, N), we dene the Besov ball B
s
p,r
(M) by
B
s
p,r
(M) =
_
h L
p
([L, L]);
__
L
L
_
N
(t, h, p)
t
s
_
r
dt
t
_
1/r
M
_
.
We have the following equivalence: h B
s
p,r
(M) if and only if there exists a constant
M
> 0 (depending on M) such that the associated wavelet coecients give in Equation
(4) satisfy
2
(1/21/p)
_
_
|
,k
|
p
_
_
1/p
+
_
j=
_
_2
j(s+1/21/p)
_
_
k
j
|
j,k
|
p
_
_
1/p
_
_
r
_
_
1/r
M
. (5)
In Equation (5), s is a smoothness parameter and p and r are norm parameters. The Besov
balls capture a wide variety of smoothness features in a function; see e.g., Meyer (1992).
3. Linear Wavelet Estimation
For any integer j and k
j
, we can estimate the unknown wavelet coecient
j,k
=
_
L
L
f(x)
j,k
(x)dx by a standard empirical one given by

j,k
=
1
n
n
i=1
j,k
(X
i
)
w
i
(X
i
)
. (6)
However, in this study, we consider

j,k
=
1
z
n
n
i=1
v
i
j,k
(X
i
)
w
i
(X
i
)
, z
n
=
n
i=1
v
i
. (7)
Our choice is motivated by the following upper bound results.
Proposition 3.1 Suppose that (X
i
)
iZ
are independent. For any integer j and
k
j
, let
j,k
=
_
L
L
f(x)
j,k
(x)dx,
j,k
be as in Equation (6) and
j,k
be as in Equation
(7). Then,
j,k
and
j,k
are unbiased estimators of
j,k
and there exists a constant C > 0
such that
E
_
(
j,k

j,k
)
2
C
1
n
2
n
i=1
1
v
i
, E
_
(
j,k

j,k
)
2
C
1
z
n
.
These bounds are as sharp as possible and we have 1/z
n
(1/n
2
)
n
i=1
1/v
i
.
We dene the linear wavelet estimator

f
lin
by
f
lin
(x) =
k
j
0

j
0
,k
j
0
,k
(x), (8)
where
j
0
,k
is dened by Equation (7) and j
0
is an integer which is chosen later.
Naturally, taking w
1
(x) = = w
n
(x) = 1, Equation (1) becomes the standard density
model and

f
lin
the standard linear wavelet estimator for this problem; see Hardle et al.
(1998, Subsection 10.2). For a survey on wavelet linear estimators for various density
models, we refer to Chaubey et al. (2010).
Theorem 3.2 Suppose that (X
i
)
iZ
are independent and lim
n
z
n
= . Suppose that
f B
s
p,r
(M), with s (0, N), p 2 and r 1. Let

f
lin
be as in Equation (8) with the
integer j
0
satisfying (1/2)z
1/(2s+1)
n
2
j
0
z
1/(2s+1)
n
. Then, there exists a constant C > 0
such that
E
__
L
L
_
f
lin
(x) f(x)
_
2
dx
_
Cz
2s/(2s+1)
n
.
Note that, when w
1
(x) = = w
n
(x) = 1, we have z
n
= n
2s/(2s+1)
and this is the
optimal rate of convergence (in the minimax sense) for the standard density estimation
problem; see Hardle et al. (1998, Theorem 10.1).
Let us now explore the performance of

f
lin
for a class of dependent (X
i
)
iZ
.
Definition 3.3 Let (X
i
)
iZ
be a random process. For any u Z, let F
X
,u
be the -
algebra generated by . . . , X
u1
, X
u
and F
X
u,
is the -algebra generated by X
u
, X
u+1
, . . .
For any m Z, we dene the mth maximal correlation coecient of (X
i
)
iZ
by
m
= sup
Z
sup
(U,V )L
2
(F
X
,
)L
2
(F
X
m+,
)
|C(U, V )|
_
V[U]V[V ]
,
where, for any A {F
X
,
, F
X
m+,
}, L
2
(A) =
_
U A; E[U
2
] <
_
and C( , ) denotes
the covariance function. Then, we say that (X
i
)
iZ
is -mixing if and only if
lim
m
m
= 0.
Further details on -mixing dependence can be found in, e.g., Kolmogorov and Rozanov
(1960), Shao (1995) and Zhengyan and Lu (1996).
Results on wavelet estimation of a density in the -mixing case can be found in Leblanc
(1996) and Hosseinioun et al. (2010).
i
)
iZ
is -mixing and there exist three constants > 0,
[0, 1) and 0 such that
lim
n
1
n
[log(n)]
m=1
m
= , lim
n
z
n
n
[log(n)]
= . (9)
Suppose that f B
s
p,r
(M), with s (0, N), p 2 and r 1. Let

f
lin
be as in Equation
(8), with the integer j
0
satisfying
1
2
_
z
n
n
[log(n)]
_
1/(2s+1)
2
j
0
_
z
n
n
[log(n)]
_
1/(2s+1)
.
Then, there exists a constant C > 0 such that
E
__
L
L
_
f
lin
(x) f(x)
_
2
dx
_
C
_
z
n
n
[log(n)]
_
2s/(2s+1)
.
The main role of the parameters and in Equation (9) is to measure the inuence of the
-mixing dependence of (X
i
)
iZ
when lim
n
n
m=1

m
= on the performance of

f
lin
.
The rst assumption in Equation (9) can be viewed as a generalization of the standard one,
i.e.,

m=1

m
C, which corresponds to = = 0; see e.g., Leblanc (1996, Assumption
M1). Observe that, if = = 0, Theorem 3.4 extends the result of Theorem 3.2; the
-mixing dependence on (X
i
)
iZ
does not deteriorate the rate of convergence z
2s/(2s+1)
n
.
The main drawback of

f
lin
is that it is not adaptive. It depends on the smoothness
parameter s in its construction. The adaptive estimation of f for independent (X
i
)
iZ
is
explored in the next section.
4. On the Adaptive Estimation of f in the Independent Case
Suppose that (X
i
)
iZ
are independent. We dene the hard thresholding estimator

f
hard
by
f
hard
(x) =
k

,k
,k
(x) +
j
1
j=
kj
j,k
1I
j,k
|
log(z
n
)
z
n
j,k
(x), (10)
where
,k
is dened by Equation (7),
j,k
=
1
z
n
n
i=1
v
i
j,k
(X
i
)
w
i
(X
i
)
1I
vi
j,k
(X
i
)
w
i
(X
i
)
z
n
log(z
n
)
, (11)
for any random event A, 1I
A
is the indicator function on A, j
1
is the integer satisfying
(1/2)z
n
< 2
j
1
z
n
, =
and 8/3 + 2 + 2
_
16/9 + 4.
The originality of

f
hard
is in the denition of Equation (11). We do not estimate the
unknown mother wavelet coecient by the standard empirical estimator; we consider
a thresholding version of it. This thresholding combined with a suitable calibration of
the parameters allows us to have power MISE properties under very mild assumptions
on w
1
(x), . . . , w
n
(x). Such a technique has been rstly introduced in a hard thresholding
wavelet procedure in Delyon and Juditsky (1996) for nonparametric regression. Another
application of this technique can be found in Chesneau (2011).
i
)
iZ
are independent and lim
n
z
n
= . Let

f
hard
be
as in Equation (10). Suppose that f B
s
p,r
(M) with r 1, {p 2 and s (0, N)} or
{p [1, 2) and s (1/p, N)}. Then, for a large enough n, there exists a constant C > 0
such that
E
__
L
L
_
f
hard
(x) f(x)
_
2
dx
_
C
_
log(z
n
)
z
n
_
2s/(2s+1)
.
Theorem 4.1 shows that

f
hard
attains a rate of convergence close to one attains by

f
lin
.
The only dierence is the negligible logarithmic term [log(n)]
2s/(2s+1)
. Mention that the
proof of Theorem 4.1 is based on Chesneau (2011, Theorem 2).
5. Proofs
In this section, C denotes any constant that does not depend on j, k and n. Its value may
change from one term to another and may depends on or .
Proof [Proposition 3.1] We have
E
_

j,k
=
1
n
n
i=1
_
L
L
j,k
(x)
w
i
(x)
g
i
(x)dx =
_
L
L
j,k
(x)f(x)dx =
j,k
. (12)
Using Equation (12), the independence of X
1
, . . . , X
n
, Equations (2) and (3) and
_
L
L
(
j,k
(x))
2
dx = 1, we obtain
E
_
(
j,k
j,k
)
2
= V
_

j,k
=
1
n
2
n
i=1
V
_
j,k
(X
i
)
w
i
(X
i
)
_
1
n
2
n
i=1
E
_
_
j,k
(X
i
)
w
i
(X
i
)
_
2
_
=
1
n
2
n
i=1
_
L
L
_
1
w
i
(x)
j,k
(x)
_
2
g
i
(x)dx
=
1
n
2
n
i=1
_
L
L
(
j,k
(x))
2
f(x)
w
i
(x)
dx C
1
n
2
n
i=1
1
v
i
.
We have
E[
j,k
] =
1
z
n
n
i=1
v
i
_
L
L
j,k
(x)
w
i
(x)
g
i
(x)dx =
1
z
n
z
n
_
L
L
j,k
(x)f(x)dx =
j,k
. (13)
Using Equation (13), again the independence of X
1
, . . . , X
n
, Equations (2), (3) and
_
L
L
(
j,k
(x))
2
dx = 1, we obtain
E
_
(
j,k

j,k
)
2
= V[
j,k
] =
1
z
2
n
n
i=1
v
2
i
V
_
j,k
(X
i
)
w
i
(X
i
)
_
1
z
2
n
n
i=1
v
2
i
E
_
_
j,k
(X
i
)
w
i
(X
i
)
_
2
_
=
1
z
2
n
n
i=1
v
2
i
_
L
L
_
j,k
(x)
w
i
(x)
_
2
g
i
(x)dx
=
1
z
2
n
n
i=1
v
2
i
_
L
L
(
j,k
(x))
2
f(x)
w
i
(x)
dx C
1
z
2
n
z
n
= C
1
z
n
. (14)
The Holder inequality yields
n =
n
i=1
v
i
1
v
i
z
1/2
n
_
n
i=1
1
v
i
_
1/2
.
Therefore
1
z
n
1
n
2
n
i=1
1
v
i
.
The proof of Proposition 3.1 is complete.
Proof [Theorem 3.2] We expand the function f on B as
f(x) =
k
j
0
j0,k
j0,k
(x) +
j=j
0
k
j
j,k
j,k
(x),
where
j
0
,k
=
_
L
L
f(x)
j
0
,k
(x)dx and
j,k
=
_
L
L
f(x)
j,k
(x)dx.
Using the fact that B is an orthonormal basis of L
2
([L, L]), Proposition 3.1 and, since
p 2, B
s
p,r
(M) B
s
2,
(M), we have
E
__
L
L
_
f
lin
(x) f(x)
_
2
dx
_
=
k
j
0
E
_
(
j0,k
j0,k
)
2
j=j
0
k
j
2
j,k
C
_
2
j
0
1
z
n
+ 2
2j
0
s
_
Cz
2s/(2s+1)
n
.
Theorem 3.2 is proved.
Proof [Theorem 3.4] First of all, let us prove the existence of a constant C > 0 such that
E
_
(
j
0
,k
j
0
,k
)
2
Cn
[log(n)]
1
z
n
.
Since
j
0
,k
is an unbiased estimator of
j
0
,k
, we have
E
_
(
j0,k

j0,k
)
2
_
= V[
j0,k
]
=
1
z
2
n
n
i=1
n
=1
v
i
v
C
_
j
0
,k
(X
i
)
w
i
(X
i
)
,

j
0
,k
(X
)
w
(X
)
_
1
z
2
n
n
i=1
v
2
i
V
_
j0,k
(X
i
)
w
i
(X
i
)
_
+
1
z
2
n
n
i=1
n
=1
=i
v
i
v
C
_
j0,k
(X
i
)
w
i
(X
i
)
,

j0,k
(X
)
w
(X
)
_
.
(15)
It follows from Equation (14) that
1
z
2
n
n
i=1
v
2
i
V
_
j
0
,k
(X
i
)
w
i
(X
i
)
_
C
1
z
n
.
In order to bound the second term in Equation (15), we use the following result on -
mixing. The proof can be found in Doukhan (1994, Section 1.2.2.).
Lemma 5.1 Let (X
i
)
iZ
be a -mixing sequence. Then, for any (i, j) Z
2
such that i =
and any functions g and h, we have
|C(h(X
i
), g(X
)) |
|i|
_
E[(h(X
i
))
2
] E[(g(X
))
2
],
whenever these quantities exist.
Using Lemma 5.1, we obtain
n
i=1
n
=1
=i
v
i
v
C
_
j
0
,k
(X
i
)
w
i
(X
i
)
,

j
0
,k
(X
)
w
(X
)
_
i=1
n
=1
=i
v
i
v
|i|
_
E
_
_
j
0
,k
(X
i
)
w
i
(X
i
)
_
2
_
E
_
_
j
0
,k
(X
)
w
(X
)
_
2
_
.
(16)
By Equations (2), (3) and
_
L
L
(
j
0
,k
(x))
2
dx = 1, we have
E
_
_
j
0
,k
(X
i
)
w
i
(X
i
)
_
2
_
=
_
L
L
_
j
0
,k
(x)
w
i
(x)
_
2
g
i
(x)dx =
_
L
L
(
j
0
,k
(x))
2
f(x)
w
i
(x)
dx C
1
v
i
.
Therefore
n
i=1
n
=1
=i
v
i
v
C
_
j
0
,k
(X
i
)
w
i
(X
i
)
,

j
0
,k
(X
)
w
(X
)
_
C
n
i=1
n
=1
=i
v
i
|i|
C
n
i=2
i1
=1
v
i
i
C
n
i=2
i1
=1
(v
i
+ v
)
i
= C
n
i=2
i1
u=1
(v
i
+ v
iu
)
u
= C
_
n
i=2
v
i
i1
u=1
u
+
n
i=2
i1
u=1
v
iu
u
_
.
Using Equation (9), we obtain
n
i=2
v
i
i1
u=1
u
z
n
n
u=1
u
Cn
[log(n)]
z
n
,
and
n
i=2
i1
u=1
v
iu
u
=
n1
u=1
u
n
i=u+1
v
iu
z
n
n
u=1
u
Cn
[log(n)]
z
n
.
Hence
n
i=1
n
=1
=i
v
i
v
C
_
j0,k
(X
i
)
w
i
(X
i
)
,

j0,k
(X
)
w
(X
)
_
Cn
[log(n)]
z
n
. (17)
Putting Equations (15), (16) and (17) together, we obtain
E
_
(
j
0
,k

j
0
,k
)
2
_
C
_
1
z
n
+
n
[log(n)]
z
n
_
C
n
[log(n)]
z
n
.
Then we proceed as in Theorem 3.2. We expand the function f on B as
f(x) =
kj
0
j0,k
j0,k
(x) +
j=j
0
kj
j,k
j,k
(x),
where
j0,k
=
_
L
L
f(x)
j0,k
(x)dx and
j,k
=
_
L
L
f(x)
j,k
(x)dx. Using the fact that B is
an orthonormal basis of L
2
([L, L]) and, since p 2, B
s
p,r
(M) B
s
2,
(M), we obtain
E
__
L
L
_
f
lin
(x) f(x)
_
2
dx
_
=
k
j
0
E
_
(
j
0
,k

j
0
,k
)
2
j=j0
k
j
2
j,k
C
_
2
j0
n
[log(n)]
z
n
+ 2
2j0s
_
C
_
z
n
n
[log(n)]
_
2s/(2s+1)
.
The proof of Theorem 3.4 is complete.
Proof [Theorem 4.1.] The result is proven using the following general result. It is a
reformulation of the result given in Chesneau (2011, Theorem 2).
Theorem 5.2 (Chesneau, 2011). Let L > 0. We want to estimate an unknown function
f with support in [L, L] from n independent random variables U
1
, . . . , U
n
. We consider
the wavelet basis B and the notations of Section 3. Suppose that there exist n functions
h
1
, . . . , h
n
such that, for any {, },
(A1) Any integer j and any k
j
,
E
_
1
n
n
i=1
h
i
(
j,k
, U
i
)
_
=
_
L
L
f(x)
j,k
(x)dx.
(A2) There exist a sequence of real numbers (
i
)
iN
satisfying lim
i
i
= and two
constants,
> 0 and > 0, such that, for any integer j and any k
j
,
1
n
2
n
i=1
E
_
(h
i
(
j,k
, U
i
))
2
_

2
2
2j
1
n
.
We dene the hard thresholding estimator

f
H
by
f
H
(x) =

,k
,k
(x) +
j
1
j=
k
j
j,k
1I
{|
j,k|j,n}
j,k
(x),
where

j,k
=
1
n
n
i=1
h
i
(
j,k
, U
i
),

j,k
=
1
n
n
i=1
h
i
(
j,k
, U
i
)1I
{|h
i
(
j,k
,U
i
)|
j,n
}
,
for any random event A, 1I
A
is the indicator function on A,
j,n
=
2
j
_

n
log(
n
)
,
j,n
=
2
j
log(
n
)
n
,
= 8/3 +2 +2
_
16/9 + 4 and j
1
is the integer satisfying (1/2)
1/(2+1)
n
< 2
j
1

1/(2+1)
n
.
Let r 1, {p 2 and s (0, N)} or {p [1, 2) and s ((2 + 1)/p, N)}. Suppose that
f B
s
p,r
(M). Then, there exists a constant C > 0 such that
E
__
L
L
_
f
H
(x) f(x)
_
2
dx
_
C
_
log(
n
)
n
_
2s/(2s+2+1)
.
Let us now investigate the assumptions (A1) and (A2) of Theorem 5.2 with, for any
i {1, . . . , n}, U
i
= X
i
,
, = 0,
n
= z
n
and
h
i
(
j,k
, y) =
n
z
n
v
i
j,k
(y)
w
i
(y)
.
On (A1). By Proposition 3.1, for any {, }, we have
E
_
1
n
n
i=1
h
i
(
j,k
, X
i
)
_
=
_
L
L
f(x)
j,k
(x)dx.
On (A2). Using Equation (14), we have
1
n
2
n
i=1
E
_
(h
i
(
j,k
, X
i
))
2
_
C
1
z
n
.
Let f B
s
p,r
(M). It follows from Theorem 5.2 that the hard thresholding estimator give
in Equation (10) satises, for any r 1, {p 2 and s (0, N)} or {p [1, 2) and
s (1/p, N)},
E
__
L
L
_
f
hard
(x) f(x)
_
2
dx
_
C
_
log(z
n
)
z
n
_
2s/(2s+1)
.
The proof of Theorem 4.1 is complete.
Acknowledgments
The authors would like to thank three anonymous referees whose constructive comments
and remarks have considerably improved the paper.
References
Antoniadis, A., 1997. Wavelets in statistics: a review (with discussion). Journal of the
Italian Statistical Society Series B, 6, 97144.
Aubin, J.-B., Leoni-Aubin, S., 2008. Adaptive Projection Density Estimation Under a
m-Sample Semiparametric Model. Annales de lI.S.U.P. Volume 5 - Numero especial,
Volume 52, Fasc. 1-2, pp. 139156.
Aubin, J.-B., Leoni-Aubin, S., 2008. Projection density estimation under a m-sample semi-
parametric model. Computational Statistics and Data Analysis, 5, 24512468.
El Barmi, H., Simono, J.S., 2000. Transformation-based estimation for weighted distri-
butions. Journal of Nonparametric Statistics, 12, 861878.
Brunel, E., Comte, F., Guilloux, A., 2009. Nonparametric density estimation in presence
of bias and censoring. TEST, 1, 166194.
Chaubey, Y.P., Chesneau, C., Doosti, H., 2011. Wavelet linear density estimation: a review.
Journal of the Indian Society of Agricultural Statistics, 65, 169-179.
Chesneau, C., 2011. Adaptive wavelet estimator for a function and its derivatives in an
indirect convolution model. Journal of Statistical Theory and Practice, 2, 303326.
Cohen, A., Daubechies, I., Jawerth, B., Vial, P., 1993. Wavelets on the interval and fast
wavelet transforms. Applied and Computational Harmonic Analysis, 1, 5481.
Delyon, B., Juditsky, A., 1996. On minimax wavelet estimators. Applied and Computa-
tional Harmonic Analysis, 3, 215228.
Doukhan, P., 1994. Mixing: Properties and Examples, Lecture Notes in Statistics Volume
85. Springer, New York.
Hardle, W., Kerkyacharian, G., Picard, D., Tsybakov, A., 1998. Wavelet, Approximation
and Statistical Applications, Lectures Notes in Statistics. Springer Verlag, New York.
Hosseinioun, N., Doosti, H., Nirumand, H.A., 2012. Nonparametric estimation of the
derivatives of a density by the method of wavelet for mixing sequences. Statistical Pa-
pers, 53, 195-203.
Kolmogorov, A.N., Rozanov, Yu.A., 1960. On strong mixing conditions for stationary
Gaussian processes. Theory of Probability and Applications, 5, 204208.
Leblanc, F., 1996. Wavelet linear density estimator for a discrete time stochastic process:
L
p
-losses. Statistics and Probability Letters, 27, 7184.
Meyer, Y., 1992. Wavelets and Operators. Cambridge University Press, Cambridge.
Patil, G.P., Rao, sC.R., 1977. The weighted distributions: a survey of their applications.
In Krishnaiah, P.R., (ed.). Applications of Statistics. North-Holland, Amsterdam, pp.
383405.
Ramirez, P., Vidakovic, B., 2010. Wavelet density estimation for stratied size-biased sam-
ple. Journal of Statistical Planning and Inference, 2, 419432.
Shao, Q.-M., 1995. Maximal inequality for partial sums of -mixing sequences. Annals of
Probability, 23, 948965.
Zhengyan, L., Lu, C., 1996. Limit Theory for Mixing Dependent Random Variables.
Kluwer, Dordrecht.
Vol. 3, No. 1, April 2012, 4356
Time Series
Research Paper
On the singular values of the Hankel matrix with
application in singular spectrum analysis
Rahim Mahmoudvand
1,
and Mohammad Zokaei
1
1
Department of Statistics, Shahid Beheshti University, Tehran, Iran
(Received: 21 June 2011 Accepted in nal form: 24 August 2011)
Abstract
Hankel matrices are an important family of matrices that play a fundamental role in
diverse elds of study, such as computer science, engineering, mathematics and statistics.
In this paper, we study the behavior of the singular values of the Hankel matrix by
changing its dimension. In addition, as an application, we use the obtained results for
choosing the optimal values of the parameters of singular spectrum analysis, which is a
powerful technique in time series based on the Hankel matrix.
Keywords: Eigenvalues Singular spectrum analysis.
Mathematics Subject Classication: Primary 15A18 Secondary 37M10.
1. Introduction
A Hankel matrix can be nite or innite and its (i, j) entry is a function of i+j; see Widom
(1966). In other words, a matrix whose entries are the same along the anti-diagonals is
called the Hankel matrix. Specically, an LK Hankel matrix H is a rectangular matrix
of the form
H =
_
_
_
_
_
h
1
h
2
. . . h
K
h
2
h
3
. . . h
K+1
.
.
.
.
.
.
.
.
.
.
.
.
h
L
h
L+1
. . . h
N
_
_
_
_
_
, (1)
where K = N L + 1.
Hankel matrices play many roles in diverse areas of mathematics, such as approximation
and interpolation theory, stability theory, system theory, theory of moments and theory
Corresponding author: Rahim Mahmoudvand. Department of Statistics, Shahid Beheshti University, PO. Box
1983963113, Evin, Tehran, Iran. Email: r.mahmodvand@gmail.com
44 R. Mahmoudvand and M. Zokaei
of orthogonal polynomials, as well as in communication and control engineering, including
lter design, identication, model reduction and broadband matching; for more details, see
Peller (2003). Thus, this type of matrices has been subjected to intensive study with re-
spect to its spectrum (collection of eigenvalues) and many interesting results were derived.
However, closed form computation of eigenvalues is not known and, consequently, the ef-
fect of changing the dimension of the matrix on its eigenvalues have not been investigated
in detail.
In recent years, singular spectrum analysis (SSA), a relatively novel, but powerful tech-
nique in time series analysis, has been developed and applied to many practical problems;
see, e.g., Golyandina et al. (2001), Hassani et al. (2009), Hassani and Thomakos (2010)
and references therein. The SSA decomposes the original time series into a sum of small
numbers of interpretable components, such as slowly varying trend, oscillatory component
and noise. The basic SSA method consists of two complementary stages: decomposition
and reconstruction; each stage includes two separate steps. At the rst stage, we decom-
pose the series and, at the second stage, we reconstruct the noise free series by using the
reconstructed series for forecasting new data points.
A short description of the SSA technique is given in the next section. For more ex-
planations and comparison with other time series analysis techniques, refer to Hassani
(2007).
The whole procedure of the SSA technique depends upon two parameters:
(i) The window length, which is usually denoted by L.
(ii) The number of needed singular values, denoted by r, for reconstruction.
Improper choice values of parameters L or r may yield incomplete reconstruction and
misleading results in forecasting.
Considering a series of length N, Elsner and Tsonis (1996) provided some discussion
and remarked that choosing L = N/4 is a common practice. Golyandina et al. (2001)
recommended that L should be large enough, but not larger than N/2. Large values of
L allow longer period oscillations to be resolved, but choosing L too large leaves too few
observations from which to estimate the covariance matrix of the L variables. It should
be noted that variations in L may inuence separability feature of the SSA technique: the
orthogonality and closeness of the singular values. There are some methods for selecting
L. For example, the weighted correlation between the signal and noise component has
been proposed in Golyandina et al. (2001) to determine the suitable value of L in terms
of separability.
Although considerable attempt and various techniques have been taken into account for
selecting the proper value of L, but there is not enough algebraic and theoretical materials
for choosing L and r. The aim of his paper is to obtain some theoretical properties of the
singular values of the Hankel matrix that can be used directly for choosing proper values
of the two parameters of the SSA.
The outline of this paper is as follows. Section 2 describes the SSA technique and also
shows the importance of a Hankel matrix for this technique. Section 3 provides the main
results of the paper. Section 4 discusses some examples and an application of the obtained
results. Section 5 sketches some conclusion of this work.
2. Singular Spectrum Analysis
In this section, we briey introduce stages of the SSA method and discuss the importance
of using a Hankel matrix in the development of this technique.
2.1 Stage I: decomposition
1st step: embedding. Embedding is as a mapping that transfers a one-dimensional time
series Y
N
= (y
1
, . . . , y
N
) into the multi-dimensional series X
1
, . . . , X
K
with vectors X
i
=
(y
i
, . . . , y
i+L1
)
R
L
, where L (2 L N1) is the window length and K = NL+1.
The result of this step is the trajectory matrix
X = (X
1
, . . . , X
K
) = (x
ij
)
L,K
i,j=1
. (2)
Note that the matrix given in Equation (2) is a Hankel matrix as dened in Equation (1).
2nd step: singular value decomposition (SVD). In this step, we perform the SVD
of X. Denote by
1
, . . . ,
L
the eigenvalues of XX
arranged in the decreasing order

(
1

L
0) and by U
1
, . . . , U
L
the corresponding eigenvectors. The SVD of X can
be written as X = X
1
+ + X
L
, where X
i
=

i
U
i
V
i
and V
i
= X
U
i
/
i
(if
i
= 0
we set X
i
= 0).
2.2 Stage II: reconstruction
1st step: grouping. The grouping step corresponds to splitting the elementary matrices
into several groups and summing the matrices within each group. Let I = {i
1
, . . . , i
p
},
for p < L, be a group of indices i
1
, . . . , i
p
. Then, the matrix X
I
corresponding to the
group I is dened as X
I
= X
i1
+ + X
ip
. The split of the set of indices {1, . . . , L}
into disjoint subsets I
1
, . . . , I
m
corresponds to the representation X = X
I1
+ + X
Im
.
The procedure of choosing the sets I
1
, . . . , I
m
is called the grouping. For a given group
I, the contribution of the component X
I
is measured by the share of the corresponding
eigenvalues
iI

i
/
d
i=1
i
, where d is the rank of X.
2nd step: diagonal averaging. The purpose of diagonal averaging is to transform a
matrix Z to the form of a Hankel matrix HZ, which can be subsequently converted to a
time series. If z
ij
stands for an element of a matrix Z, then the kth term of the resulting
series is obtained by averaging z
ij
for all i, j such that i +j = k +1. Hankelization HZ is
an optimal procedure, which is nearest to Z with respect to the matrix norm.
3. Theoretical Results
Along the paper, the matrices to be considered are over the eld of the real numbers. In
addition, we consider dierent values of L, whereas N is supposed to be xed. Recall that,
for any operator A, the operator AA
is always positive, and its unique positive square

root is denoted by |A|. The eigenvalues of |A| counted with multiplicities are called the
singular values of A. In this section, we provide the main results of the paper.
3.1 On sum of square of the singular values of a Hankel matrix
Let
L,N
j
denote the jth ordered eigenvalue of HH
. Then, for a xed value of L, the trace

of HH
is given by
T
L,N
H
= tr(HH
) =
L
j=1
L,N
j
. (3)
The behavior of T
L,N
H
given in Equation (3), with respect to dierent values of L, is
considered in the following theorem.
Theorem 3.1 Consider the Hankel matrix H as dened in Equation (1). Then,
T
L,N
H
=
N
j=1
w
L,N
j
h
2
j
,
where w
L,N
j
= min{min{L, K}, j, L +K j} = w
K,N
j
.
Proof Applying denition of H as given in Equation (1), we have
T
L,N
H
=
L
i=1
NL+i
j=i
h
2
j
. (4)
Changing the order of the summations in Equation (4), we get
T
L,N
H
=
N
j=1
C
j,L,N
h
2
j
,
where C
j,L,N
= min{j, L} max{1, j N +L} +1. Therefore, we only need to show that
C
j,L,N
= w
L,N
j
, for all j and L. We consider two cases: L K and L > K. For the rst
case, we have
C
j,L,N
=
_
_
_
j, 1 j L;
L, L + 1 j K;
N j + 1, K + 1 j N,
which is exactly equals to w
L,N
j
. Similarly for the second case, we get
C
j,L,N
=
_
_
_
j, 1 j K;
K, K + 1 j L;
N j + 1, L + 1 j N;
and again is equal to w
L,N
j
, for L > K.
The weight w
L,N
j
dened in Theorem 3.1 can be written in the functional form
w
L,N
j
=
N + 1
2

N+1
2
L
N+1
2
j
N+1
2
L
N+1
2
j
2
. (5)
Equation (5) shows that
w
L,N
j
is a concave function of L for all j, where j {1, . . . , N};
w
L,N
j
is a concave function of j for all L, where L {2, . . . , N 1};
w
L,N
j
is a symmetric function around line (N + 1)/2 with respect to j and L.
The above mentioned results imply that the behavior of the quantity T
L,N
H
is similar on
two intervals 2 L [(N + 1)/2] and [(N + 1)/2] + 1 L N 1, where, as usual, [x]
denotes the integer part of the number x. Therefore, we only need to consider one of these
intervals.
Theorem 3.2 Let T
L,N
H
be dened as in Equation (3). Then, T
L,N
H
is an increasing func-
tion of L on {2, . . . , [(N + 1)/2]}, a decreasing function on {[(N + 1)/2] + 1, . . . , N 1},
and
max T
L,N
H
= T
[
N+1
2
],N
H
.
Proof First, we show that w
L,N
j
is an increasing function of L on {2, . . . , [(N + 1)/2]}.
Let L
1
and L
2
be two arbitrary values, where L
1
< L
2
[(N + 1)/2]. From the denition
of w
L,N
j
, we have
w
L2,N
j
w
L1,N
j
=
_
_
0, 1 j L
1
;
j L
1
, L
1
+ 1 j L
2
;
L
2
L
1
, L
2
+ 1 j N L
2
+ 1;
N j + 1 L
1
, N L
2
+ 2 j N L
1
+ 1;
0, N L
1
+ 2 j N.
Therefore, w
L2,N
j
w
L1,N
j
0, for all j, and inequality is strict for some j. Thus,
T
L2,N
H
T
L1,N
H
=
N
j=1
_
w
L2,N
j
w
L1,N
j
_
h
2
j
> 0. (6)
This conrms that T
L,N
H
is an increasing function of L on {2, . . . , [(N + 1)/2]}. Similar
approach for the set {[(N + 1)/2] +1, . . . , N1} implies that T
L,N
H
is a decreasing function
of L on this interval. Note also that T
L
2
,N
H
T
L
1
,N
H
in Equation (6) increases as the value
of L
2
increases too proving that T
L,N
H
is an increasing function on {2, . . . , [(N + 1)/2]}.
Therefore, the maximum value of T
L
2
,N
H
is attained at the maximum value of L, which is
[(N + 1)/2].
Corollary 3.3 Let L
max
denote the value of L such that T
L,N
H
T
L
max
,N
H
, for all L, and
the inequality to be strict for some values of L. Then,
L
max
=
_
_
N + 1
2
, if N is odd;
N
2
and
N
2
+ 1, if N is even.
Corollary 3.3 shows that L = median{1, . . . , N} maximizes the sum of squares of the
Hankel matrix singular values with xed values of N. Applying Corollary 3.3 and Equation
(5), we can show that
w
L
max
,N
j
=
N + 1
2

N + 1
2
j
. (7)
Equation (7) shows that h
[(N+1)/2]
has maximum weight at T
L,N
H
.
3.2 Eigenvalues of HH
and rank of H
Here, some inequalities between the ordered eigenvalues for dierent values of L are derived.
According to Cauchys interlacing theorem, it can be given the following theorem; see
Bhatia (1997).
Theorem 3.4 Let H be an L K Hankel matrix as dened in Equation (1). Then, we
have
L,N
j

Lm,Nm
j

L,N
j+m
, j = 1, . . . , L m,
where m is a number belonging to the set {1, . . . , L 1}.
Proof Consider the partition
HH
=
_
_
HH
(1)
HH
(3)
HH
(2)
HH
(4)
_
_
,
where
HH
(1)
=
_
_
_
_
_
_
_
_
_
_
_
_
K
j=1
h
2
j
K
j=1
h
j
h
j+1
. . .
K
j=1
h
j
h
j+Lm1
K
j=1
h
j+1
h
j
K
j=1
h
2
j+1
. . .
K
j=1
h
j+1
h
j+Lm1
.
.
.
.
.
.
.
.
.
.
.
.
K
j=1
h
j+Lm1
h
j
K
j=1
h
j+Lm1
h
j+1
. . .
K
j=1
h
2
j+Lm1
_
_
_
_
_
_
_
_
_
_
_
_
.
Using this partitioning form, we can say that the sub-matrix HH
(1)
is obtained from a
Hankel matrix corresponding to the sub-series H
Nm
= (h
1
, . . . , h
Nm
) and its eigenvalues
are
Lm,Nm
1

Lm,Nm
2

Lm,Nm
Lm
0. Therefore, the proof is completed
using Cauchys interlacing theorem.
Now, we would like to nd a relationship between
Lm,N
j
and
L,N
j
. Therefore, Theorem
3.4 should not use directly. Next, we consider four cases and show that we can nd general
relationships for some classes of the Hankel matrix.
3.2.1 Case 1: L 1, rank of H = 1
In this case, it is obvious that we have one positive eigenvalue. Therefore, we can write
L,N
1
=
L
j=1
L,N
j
= tr(HH
) =
L
l=1
K+l1
j=l
h
2
j
.
According to Theorem 3.2, eigenvalue
L,N
1
increases with L till [(N + 1)/2] and then
decreases for L [(N + 1)/2] +1. Therefore, we have
Lm,N
1

L,N
1
and L [(N + 1)/2]
providing that the conditions of Case 1 are satised.
3.2.2 Case 2: L = 2, rank of H = 2
In this case, HH
has at most two eigenvalues which are the solution of the quadratic
equation
_
_
N1
j=1
h
2
j
+
N
j=2
h
2
j
_
_
+
N1
j=1
h
2
j
N
j=2
h
2
j

_
_
N1
j=1
h
j
h
j+1
_
_
2
= 0. (8)
Equation (8) has two real solutions so that we have two real eigenvalues. The rst eigen-
value (larger one) is given by
2,N
1
=
N1
j=1
h
2
j
+
N
j=2
h
2
j
+
_
_
h
2
1
h
2
N
_
2
+ 4
_
N1
j=1
h
j
h
j+1
_
2
2
. (9)
Equation (9) shows that
2,N
1
=
_
_
_

1,N
1
,
_
N1
j=1
h
j
h
j+1
_
2
h
2
1
h
2
N
;

1,N
1
,
_
N1
j=1
h
j
h
j+1
_
2
h
2
1
h
2
N
;
(10)
where
1,N
1
=

N
j=1
h
2
j
, when L = 1. Practically, it seems that the rst condition of
Equation (10) is usually satised for a wide classes of models. For example, it can be
seen that the condition is equivalent to monotonicity of the sequence {h
j
, j = 1, . . . , N}.
For a non-negative (or non-positive) monotone sequence, we have
N1
j=1
h
j
h
j+1
h
1
h
N
.
Applying Equation (9), it follows
2,N
1

N1
j=1
h
2
j
=
1,N1
1
. A greater class is obtained
if we consider positive data, where all observations are bigger that the rst one and h
1

h
N
/(N 1). Under this condition, it is easy to show that

N1
j=1
h
j
h
j+1
h
1
h
N
and
therefore
2,N
1

1,N
1
. In the next section, we see some examples of models that have not
these conditions but
2,N
1

1,N
1
.
It is worth mention that we can state a geometrical display of Equation (8) as
_
||h
1:N1
||
2
+||h
2:N
||
2
_
+||h
1:N1
||
2
||h
2:N
||
2
(sin (
1,2
))
2
= 0, (11)
where h
1:N1
and h
2:N
denote the rst and second rows of H, ||.|| the Euclidean norm and
1,2
the angle between two rows of H. Notice that last expression in Equation (11) is the
magnitude of the cross product between two rst rows of H. Since (sin (
1,2
))
2
1, it is
easy to obtain the inequality
2,N
1

1,N1
1
, which is a direct result of Theorem 3.4 from
characteristics given in Equation (11).
3.2.3 Case 3: L > 2, rank of H = 2
In this case, HH
has two positive eigenvalues. To obtain the eigenvalues, rst of all note
that
det
_
I HH
_
=
L
+c
1
L1
+ +c
L1
+c
L
, (12)
where the coecients of c
j
can be obtained from following lemma.
Lemma 3.5 (Horn and Johnson, 1985, Theorem 1.2.12) Let A be an nn real or complex
matrix with eigenvalues
1
, . . . ,
n
. Then, for 1 k n,
(i) s
k
() = (1)
k
c
k
, and
(ii) s
k
() is the sum of all the k k principal minors of A.
Equation (12) shows that the eigenvalues of HH
in this case are the solution of
l=1
K+l1
j=l
h
2
j
+
L1
l=1
Ll
i=1
_
_
_
_
_
K+l1
j=l
h
2
j
_
_
_
_
K+l1
j=l
h
2
j+i
_
_
_
_
k+l1
j=l
h
j
h
j+i
_
_
2
_
_
_
= 0.
(13)
The rst eigenvalue (larger one) is given by
L,N
1
=
1
2
_
_
_
L
l=1
K+l1
j=l
h
2
j
+
_
L
_
_
_
, (14)
where
L
is the discriminant of the quadratic expression given in Equation (13). According
to Equation (14), it is easy to see that, for L [(N + 1)/2],
L,N
j

L1,N
j

L1

NL+1
l=L
h
2
l
, j = 1, 2.
Similar to the previous case, Equation (13) may be reformulated in the language of mul-
tivariate geometry for the L-lagged vectors given by
j=1
||h
j:K+j1
||
2
+
L1
i=1
L
j=i+1
||h
i:K+i1
||
2
||h
j:K+j1
||
2
(sin (
i,j
))
2
= 0,
where notations are dened similarly as mentioned in Case 3.
3.2.4 Case 4: L > 2, rank of H > 2
Applying Equation (12), it can be obtained the characteristic equation whose solution
gives the eigenvalues of HH
. However, their functional forms are very sophisticated in

this case and, therefore, we consider several series to check the interesting relationship
between eigenvalues.
3.3 Contribution of eigenvalues in T
L,N
H
The ratio

r
j=1
L,N
j
/
L
j=1
L,N
j
is the characteristic of the best r-dimensional approxi-
mation of the lagged vectors in the SSA technique. Furthermore, this ratio is an obvious
criterion for choosing the proper values of the parameters r and L in the SSA. Therefore,
the study for changing this ratio with respect to L and r is important for the SSA tech-
nique. First of all, note that, if let C
L,N
j
=
L,N
j
/
L
j=1
L,N
j
, then it is easy to see that,
for some j {1, . . . , L m} and all values of m belonging to {1, . . . , L 1}, we have
C
L,N
j
C
Lm,N
j
. (15)
Since inequality given in Equation (15) is satised for all values of m belonging to
{1, . . . , L 1}, it appears that C
L,N
1
is decreasing on L {2, . . . , [(N + 1)/2]}. In the
next section, we see examples that show whether such a behavior is true for polynomial
models or not.
4. Examples and Application
In this section, we discuss some examples related to the theoretical results obtained in
Section 3. Also, we provide an application of these results.
4.1 Examples
Example 4.1 Let h
t
= exp(
0
+
1
t), for t = 1 . . . , N. It is easy to see that the corre-
sponding Hankel matrix H has rank one. Figure 1 shows rst singular value of H for this
model with
0
= 0.1,
1
= 0.2 and N = 20, which is convex with respect to L and attains
maximum value at L = 10, 11, i.e., the median of {1, . . . , 20}.
5 10 15
1
4
0
1
5
0
1
6
0
1
7
0
1
8
0
L
S
i
n
g
u
l
a
r

v
a
l
u
e
Figure 1. Plot of the rst singular value of H for dierent values of L: example 4.1.
Now, we consider two dierent examples where their corresponding Hankel matrices have
rank two. The rst one is a simple linear model and the second is a cosine model. As we
see for both models, roughly speaking, we can say that the results are somewhat similar
to Example 4.1.
Example 4.2 Let h
t
=
0
+
1
t, for t = 1, . . . , N. It is easy to show that rank of the
corresponding Hankel matrix H is two. Figure 2 shows the rst and second singular values
of H for
0
= 1,
1
= 2 and N = 20. From this gure, we can say that both rst and
second singular values of H increase for L [(N + 1)/2] and then decrease.
5 10 15
1
6
0
1
8
0
2
0
0
2
2
0
2
4
0
L
S
i
n
g
u
l
a
r

v
a
l
u
e
5 10 15
6
8
1
0
1
2
1
4
1
6
L
S
i
n
g
u
l
a
r

v
a
l
u
e
Figure 2. Plots of the rst (left) and second (right) singular values of H for dierent values of L: example 4.2.
Example 4.3 Let h
t
= cos (t/12), for t = 1, . . . , N. First and second singular values of
H are depicted in Figure 3 for series length 100. If we connive some small uctuations in
the plots, we can say that behavior of singular values of H is similar to Example 4.2.
0 20 40 60 80 100
1
0
1
5
2
0
2
5
L
S
i
n
g
u
l
a
r

v
a
l
u
e
0 20 40 60 80 100
5
1
0
1
5
2
0
2
5
L
S
i
n
g
u
l
a
r

v
a
l
u
e
Figure 3. Plots of the rst (left) and second (right) singular values of H for dierent values of L: example 4.3.
Example 4.4 Let h
t
=
0
+
1
t +
2
t
2
, for t = 1, . . . , N. It is easy to show that rank
of the corresponding Hankel matrix H is 3. Figure 4 shows the singular values of H for
0
= 1,
1
= 2,
2
= 3 and N = 20. From this gure, we note that all the singular values
of H increase for L [(N + 1)/2] and then decrease, which coincides with Theorem 3.2.
5 10 15
3
5
0
0
4
0
0
0
4
5
0
0
5
0
0
0
L
S
i
n
g
u
l
a
r

v
a
l
u
e
5 10 15
5
0
1
0
0
1
5
0
2
0
0
2
5
0
3
0
0
3
5
0
L
S
i
n
g
u
l
a
r
v
a
l
u
e
5 10 15
5
1
0
1
5
L
S
i
n
g
u
l
a
r

v
a
l
u
e
Figure 4. Plots of the three largest singular values of H for dierent values of L: example 4.4.
Example 4.5 Let h
t
= log(t), for t = 1, . . . , N. Then, it can be seen that rank of the
corresponding Hankel matrix H is four. Singular values of H are shown in Figure 5 for
N = 20. The results of this example are in concordance with Example 4.4.
Figure 6 shows two singular values for models h
t
= cos(t/12) (left) and h
t
= log(t)
(right), for N = 5, . . . , 100. Solid and dashed lines in Figure 6 denote the singular values
for L = 2 and L = 1, respectively. Both of these values conrm our expectation for
discrepancy between two singular values. Notice that the cosine model is not monotone,
but
2,N
1

1,N
1
.
5 10 15
1
4
1
6
1
8
2
0
2
2
2
4
L
S
i
n
g
u
l
a
r

v
a
l
u
e
5 10 15
1
.
0
1
.
2
1
.
4
1
.
6
1
.
8
2
.
0
L
S
i
n
g
u
l
a
r

v
a
l
u
e
4 6 8 10 12 14 16
0
.
0
6
0
.
0
8
0
.
1
0
0
.
1
2
L
S
i
n
g
u
l
a
r

v
a
l
u
e
6 8 10 12 14 16
0
.
0
0
3
0
.
0
0
4
0
.
0
0
5
0
.
0
0
6
0
.
0
0
7
0
.
0
0
8
L
S
i
n
g
u
l
a
r

v
a
l
u
e
Figure 5. Plots of the four largest singular values of H with respect to dierent values of L: example 4.5.
20 40 60 80
2
4
6
8
Length of Series
S
i
n
g
u
l
a
r

v
a
l
u
e
L=1
L=2
20 40 60 80
1
0
2
0
3
0
4
0
5
0
Length of Series
S
i
n
g
u
l
a
r

v
a
l
u
e
L=1
L=2
Figure 6. Plots of the rst singular value for values of L and N in cosine (left) and logarithm (right) models.
Example 4.6 Let h
t
=
0
+
1
t +
2
t
2
, for t = 1, . . . , N (a polynomial model). Figure 7
shows the ratio C
L,N
j
for
0
= 1,
1
= 2,
2
= 3, N = 20 and j = 1, 2, 3. From this gure,
we note that C
L,N
1
decreases for the values of L less than [(N + 1)/2] and then increases
on the set L {[(N + 1)/2] + 1, . . . , N 1}. Whereas C
L,N
2
and C
L,N
3
increase on the set
{1, . . . , [(N + 1)/2]} and decrease on {[(N + 1)/2] + 1, . . . , N}.
5 10 15
0
.
9
9
5
0
.
9
9
6
0
.
9
9
7
0
.
9
9
8
0
.
9
9
9
L
R
a
t
i
o
5 10 15
0
.
0
0
1
0
.
0
0
2
0
.
0
0
3
0
.
0
0
4
0
.
0
0
5
L
R
a
t
i
o
5 10 15
0
.
0
e
+
0
0
4
.
0
e
0
6
8
.
0
e
0
6
1
.
2
e
0
5
L
R
a
t
i
o
Figure 7. Plots of C
L,N
j
with respect to L for N = 20 and j = 1 (left), j = 2 (center), j = 3 (right): example 4.6.
Next, we examine the cases where the degree of the polynomial is greater than two.
Furthermore, dierent coecients are considered. The results are similar to Example 4.6
and thus we do not report them here. As a general result, we can say that inequality given
in Equation (15) is satised for j = 1 in the polynomial models. Now, we consider the
ratio C
L,N
1:r
=
r
j=1
C
L,N
j
. Since C
L,N
1
is bigger than C
L,N
j
, for j > 1, and the discrepancy
between them usually is so much (see the polynomial model of Example 4.6), we expect
that the ratio C
L,N
1:r
has a behavior such as C
L,N
1
. In the following example, the behavior
of this ratio is depicted for the polynomial model with degree four.
Example 4.7 Let h
t
=
0
+
1
t +
2
t
2
+
3
t
3
+
4
t
4
, for t = 1, . . . , N. Figure 8 shows
the ratio C
L,N
1:r
for
0
= 1,
1
= 2,
2
= 3,
3
= 4,
4
= 5 and N = 20. From this gure,
we note that C
L,N
1:r
decreases on L {2, . . . , [(N + 1)/2]}, for r 1, and then increases on
L > [(N + 1)/2], as expected.
5 10 15
0
.
9
9
7
5
0
.
9
9
8
0
0
.
9
9
8
5
0
.
9
9
9
0
0
.
9
9
9
5
L
R
a
t
i
o
5 10 15 0
.
9
9
9
9
8
0
0
.
9
9
9
9
8
5
0
.
9
9
9
9
9
0
0
.
9
9
9
9
9
5
1
.
0
0
0
0
0
0
L
R
a
t
i
o
5 10 15
L
R
a
t
i
o
1
Figure 8. Plot of C
L,N
1:r
with respect to L for N = 20 and r = 1 (left), r = 2 (center), r = 3 (right): example 4.7.
4.2 Choosing the SSA parameters
Several rules have been proposed in the literature for choosing the SSA parameters; see,
e.g., Golyandina et al. (2001) and Hassani et al. (2011). However, the list is by no means
exhaustive. Certainly, the choice of parameters depends on the data collected and on the
analysis we have performed. Anyway one important note is that singular values give most
eective information for choosing parameters in the SSA. In previous subsections, several
criteria and theorems were considered to investigate the behavior of singular values of the
Hankel matrix. Considering theoretical results about the structure of the Hankel matrix,
trajectory matrix and relationship with their dimensions, enable us to state that the choice
of L close to one-half of the time series length is a suitable choice for decomposition stage in
most cases. The previous empirical and theoretical results also conrm the results obtained
by us here. However, by using denition of the criteria T
L,N
H
, it can be seen that
T
L,N
H
T
L1,N
H
=
K
j=L
h
2
j
. (16)
Equation (16) is the rate of change in tr(HH
) for each unit change in the window length.

This rate is large for small values of the window length and decreases to attain minimum
value at L = K, where it is equivalent to L = L
max
= median{1, . . . , N}. This motivate
us to choose smaller values than L
max
when the rate given in Equation (16) is small. To
support this motivation, Golyandina et al. (2001) said that series with a complex structure
and too large window length L can produce an undesirable decomposition of the series
components of interest, which may lead, in particular, to their mixing with other series
components. Sometimes, in these circumstances, even a small variation in the value of L
can reduce mixing and lead to a better separation of the components, i.e., it provides a
transition from weak to strong separability.
Another important parameter to be chosen is the number of needed singular values r for
grouping in the reconstruction step. Election of this parameter is similar to the procedure
for obtaining the cuto value in principal component analysis. It is known that there is
not a general way to choose an optimal value of the cuto number and it depends on the
data; for a complete description and review of this topic, see Jollife (2002). Perhaps the
most obvious criterion for choosing the cuto value is to select a (cumulative) percentage
of the total variation, which one desires that the selected singular values contribute, say
a 80% or 90%. The required number of singular values is then the smallest value of r for
which this chosen percentage is exceeded. This criterion is equivalent to the ratio C
L,N
1:r
previously dened.
5. Conclusions
We have considered one of the main and most important issues in the singular spectrum
analysis, that is, the selection of parameters. As stated, singular values of the trajectory
matrix in the singular spectrum analysis play an important role. Specically, election of the
parameters values of the window length (L) and the number of needed singular values for
reconstruction of series (r) depend on the behavior of the singular values of the trajectory
matrix. In this paper, we have studied the behavior of the singular values of a Hankel
matrix (H) with respect to its dimension. We have shown that, for a wide classes of time
series, the singular value of HH
(
L,N
j
) increases with L in L {1, . . . , [(N + 1)/2]} and
decreases in L {[(N + 1)/2] + 1, . . . , N}. In addition, we have investigated the behavior
of the sum of square and the contribution of each singular value. The results based on
these criteria have shown that the choice of L close to one-half of the time series length is
a suitable choice for decomposition stage in most cases for the singular spectrum analysis.
Acknowledgements
Authors would like to thank anonymous referees for their valuable comments that improved
the exposition of the paper.
References
Bhatia, R., 1997. Matrix Analysis. Springer, New York.
Elsner, J.B., Tsonis, A.A., 1996. Singular Spectrum Analysis: A New Tool in Time Series
Analysis. Plenum Press, New York.
Golyandina, N., Nekrutkin, V., Zhigljavsky, A., 2001. Analysis of Time Series Structure:
SSA and Related Techniques. Chapman & Hall/CRC, New York.
Hassani, H., 2007. Singular spectrum analysis: methodology and comparision. Journal of
Data Science, 5, 239257.
Hassani, H., Heravi, S., Zhigljavsky, A., 2009. Forecasting European industrial production
with singular spectrum analysis. International Journal of Forecasting, 25, 103118.
Hassani, H., Mahmoudvand, R., Zokaei, M., 2011. Separability and window length in sin-
gular spectrum analysis. Comptes Rendus Mathematique (in press).
Hassani, H., Thomakos, D., 2010. A Review on singular spectrum analysis for economic
and nancial time series. Statistics and its Interface, 3, 377397.
Horn, R.A., Johnson, C.R., 1985. Matrix Analysis. Cambridge University Press, Cam-
bridge.
Jollie, I.T., 2002. Principal Component Analysis. Springer, New York.
Peller, V., 2003. Hankel Operators and Their Applications. Springer, New York.
Widom, H., 1966. Hankel matrix. Transactions of the American Mathematical Society,
121, 135.
Vol. 3, No. 1, April 2012, 5773
Statistical Modeling
Research Paper
On linear mixed models and their inuence
diagnostics applied to an actuarial problem
Luis Gustavo Bastos Pinho, Juv encio Santos Nobre
and Slvia Maria de Freitas

Department of Applied Mathematics and Statistics, Federal University of Ceara, Fortaleza, Brazil
(Received: 10 June 2011 Accepted in nal form: 23 September 2011)
Abstract
In this paper, we motivate the use of linear mixed models and diagnostic analysis in
practical actuarial problems. Linear mixed models are an alternative to traditional cred-
ibility models. Frees et al. (1999) showed that some mixed models are equivalent to some
widely used credibility models. The main advantage of linear mixed models is the use of
diagnostic methods. These methods may help to improve the model choice and to iden-
tify outliers or inuential subjects which deserve better attention by the insurer. As an
application example, the data set in Hachemeister (1975) is modeled by a linear mixed
model. We can conclude that this approach is superior to the traditional credibility one
since the former is more exible and allows the use of diagnostic methods.
Keywords: Credibility models Hachemeister model Linear mixed models
Diagnostics Local inuence Residual analysis.
Mathematics Subject Classication: Primary 62J05 Secondary 62J20.
1. Introduction
One of the main concerns in actuarial science is to predict the future behavior for the
aggregate amount of claims of a certain contract based on its past experience. By accurately
predicting the severity of the claims, the insurer is able to provide a fairer and thus more
competitive premium.
Statistical analysis in actuarial science generally belongs to the class of repeated measures
studies, where each subject may be observed more than once. By subject we mean each
element of the observed set which we want to investigate. Workers of a company, class of
employees, and dierent states are possible examples of subjects in actuarial science. To
model actuarial data, a large variety of statistical models can be used, but it is usually
dicult to choose a model due to the data structure, in which within-subject correlation
is often seen. Correlation misspecication may lead to erroneous analysis. In some cases
this error is very severe. A clear example may be seen in Demidenko (2004, pp. 2-3) and
a similar articial situation is reproduced in Figure 1, that shows the relation between
the number of claims and the number of policy holders of an insurer for nine dierent
regions within a country. In each region the two variables are measured once a year on the
Corresponding author. Juvencio Santos Nobre. Departamento de Estatstica e Matematica Aplicada. Universidade
Federal do Ceara. Fortaleza/CE-Brazil. CEP:60.440-900. Email: juvencio@ufc.br
58 L.G. Bastos Pinho, J.S. Nobre and S.M. Freitas
same day for three consecutive years. In Figure 1(a) we do not consider the within-region
(within-subject) correlation. The dashed line is a simple linear regression and suggests that
the more the policy holders, the less claims occur. In Figure 1(b) we joined the observations
for each region by a solid line. It is clear now that the number of claims increases with the
number of policy holders.
12 14 16 18 20
5
0
0
5
5
0
6
0
0
Number of policy holders (thousands)
N
u
m
b
e
r

o
f

c
l
a
i
m
s
12 14 16 18 20
5
0
0
5
5
0
6
0
0
Number of policy holders (thousands)
N
u
m
b
e
r

o
f

c
l
a
i
m
s
Figure 1. (a) Not considering the within-subject correlation, (b) considering the within-subject correlation.
It is necessary to take into consideration that each region may have a particular behavior
which should be modeled, but only this is usually not enough. Techniques summarized
under the name of diagnostic procedures may help to identify issues of concern, such as high
inuential observations, which may distort the analysis. For linear homoskedastic models,
a well known diagnostic procedure is the residual plot. For linear mixed models better
types of residuals are dened. Besides residual techniques, which are useful, there is a less
used class of diagnostic procedures, which includes case deletion and measuring changes in
the likelihood of the adjusted model under minor perturbations. Several important issues
may not be noticed without the aid of these last diagnostics methods.
For introductory information regarding regression models and respective diagnostic anal-
ysis; see Cook and Weisberg (1982) or Drapper and Smith (1998). For a comprehensive
introduction to linear mixed models, see Verbeke and Molenberghs (2000), McCulloch
and Searle (2001) and Demidenko (2004). Diagnostic analysis of linear mixed models were
presented and discussed in Beckman et al. (1987), Christensen and Pearson (1992), Hilden-
Minton (1995), Lesare and Verbeke (1998), Banerjee and Frees (1997), Tan et al. (2001),
Fung et al. (2002), Demidenko (2004), Demidenko and Stukel (2005), Zewotir and Galpin
(2005), Gumedze et al. (2010) and Nobre and Singer (2007, 2011).
The seminal work of Frees et al. (1999) showed some similarities and equivalences be-
tween mixed models and some well known credibility models. Applications to data sets in
actuarial context may be seen in Antonio and Beirlant (2006). Our contribution is to show
how to use diagnostic methods for linear mixed models applied to actuarial science. We
illustrate how to identify outliers and inuential observations and subjects. We also show
how to use diagnostics as a tool for model selection. These methods are very important
and usually overlooked by most of the actuaries.
This paper is divided as follows. In Section 2 we present a motivational example using
a well known data set. In Section 3 we briey present the linear mixed models. Section 4
contains a short introduction to the diagnostic methods used in the example. In Section
5 we present an application based on the motivational example. Section 6 shows some
conclusions. Finally, in an Appendix, we present mathematical details of some formulas
and expressions used in the text.
2. Motivational Example
For a practical example, consider the Hachemeister (1975) data on private passenger bodily
injury insurance. The data were collected from ve states (subjects) in the US, through
twelve trimesters between July 1970 and June 1973, and show the mean claim amount and
the total number of claims in each trimester. The data may be found in the actuar package
(see Dutang et al., 2008) from R (R Development Core Team, 2009) and are partially shown
in Table 1.
Table 1. Hachemeisters data.
Trimester State Mean claim amount Number of claims
1 1 1738 7861
1 2 1364 1622
1 3 1759 1147
1 4 1223 407
1 5 1456 2902
2 1 1642 9251
.
.
.
.
.
.
.
.
.
.
.
.
12 1 2517 9077
12 2 1471 1861
12 3 2059 1121
12 4 1306 342
12 5 1690 3425
In Figure 2 we plot the individual proles for each state and the mean prole. It suggests
that the claims have a dierent behavior along the trimesters for each state. One may
notice that the claims from state 1 are greater than those from other states for almost
every observation, and the claims from states 2 and 3 seem to grow more slowly than
those from state 1. If the insurer wants to accurately predict the severity, the subjects
individual behavior must also be modeled. Traditionally this is possible with the aid of
credibility models; see, e.g., B uhlmann (1967), Hachemeister (1975) and Dannenburg et
al. (1996). These models assign weights, known as credibility factors, to a pair of dierent
estimates of severity.
2 4 6 8 10 12
1
0
0
0
1
5
0
0
2
0
0
0
2
5
0
0
Trimester
A
v
e
r
a
g
e

c
l
a
i
m

a
m
o
u
n
t
State 1
State 2
State 3
State 4
State 5
Average
Figure 2. Individual proles and mean prole for Hachemeister (1975) data.
Credibility models may be functionally dened as
ZB + (1 Z)C,
where A represents the severity in a given state, Z is a credibility factor restricted to [0, 1],
B is a priori estimate of the expected severity for the same estate and C is a posteriori
estimate also of the expected severity. Considering a particular state, B may be equal to
the sample mean of the severity of its observations and C equal to the overall sample mean
of the data in the same period.
Frees et al. (1999) showed that it is possible to nd linear mixed models equivalent to
some known credibility models, such as B uhlmann (1967) and Hachemeister (1975) models.
Information about linear mixed models is provided in the next section.
3. Linear Mixed Models
Linear mixed models are a popular alternative to analyze repeated measures. Such models
may be functionally expressed as
y
i
= X
i
+Z
i
b
i
+e
i
, i = 1, . . . , k, (1)
where y
i
= (y
1
, y
2
, . . . , y
n
i
)
is a n
i
1 vector of the observed values of the response variable
for the ith subject, X
i
is a n
i
p known full rank matrix, is a p 1 vector of unknown
parameters, also known as xed eects, which are used to model E[y
i
], Z
i
is a n
i
q known
full rank matrix, b
i
is a q 1 vector of latent variables, also known as random eects,
used to model the within-subject correlation structure, and e
i
= (e
i1
, e
i2
, . . . , e
in
i
)
is the
n
i
1 random vector of (within-subject) measurement errors. It is usually also assumed
that e
i
ind
N
ni
(0,
2
I
ni
), where I
ni
denotes the identity matrix of order n
i
for i = 1, . . . , k,
b
i
iid
N
q
(0,
2
G) for i = 1, . . . , k in which G is a q q positive denite matrix, and
e
i
and b
j
are independent i, j. Under these assumption, this is called a homoskedastic
conditional independence model. It is possible to rewrite model given in Equation (1) in a
more concise way as
y = X +Zb +e, (2)
where y = (y
1
, . . . , y
k
)
, X = (X
1
, . . . , X
k
)
, Z =

k
i=1
Z
i
, b = (b
1
, . . . , b
k
)
and
e = (e
1
, . . . , e
k
)
, with

representing the direct sum.
It can be shown that, conditional on known covariance parameters of the model, that is
conditional to the elements of G and
2
, the best linear unbiased estimator (BLUE) for
and the best linear unbiased predictor (BLUP) for b are given by
= (X
V
1
X)
1
X
V
1
y, (3)
and
b = DZ
V
1
(y X
),
respectively, where D =
2
G, V =
2
(I
n
+ ZGZ
), with n =

k
i=i
n
i
; see Hachemeister
(1975).
Maximum likelihood (ML) and restricted maximum likelihood (RML) methods can be
used to estimate the variance components of the model. The latter, proposed in Patterson
and Thompson (1971), is usually chosen since it often generates less biased estimators
related to the variance structure. When estimates for V are used in Equation (3) to
obtain

and

b, they are called empirical BLUE (EBLUE) and empirical BLUP (EBLUP),
respectively. Usually the estimation of the parameters involves the use of iterative methods
for maximizing the likelihood function.
Linear mixed models are not the only way to deal with repeated measures studies. Other
popular alternatives are the generalized estimation equations (see Liang and Zeger, 1986;
Diggle et al., 2002) and multivariate models as seen in Johnson and Whichern (1982)
and Vonesh and Chinchilli (1997). But usually these alternatives are more restrictive than
linear mixed models, and they only model the marginal expected value of the response
variable.
4. Diagnostic Methods
Diagnostic methods comprehend techniques whose purpose is to investigate the plausi-
bility and robustness of the assumptions made when choosing a model. It is possible to
divide the techniques shown here in two classes: residual analysis, which investigates the
assumptions on the distribution of errors and presence of outliers; and sensitivity analysis,
which analyzes the sensitivity of a statistical model when subject to minor perturbations.
Usually, it would be far more dicult, or even impossible, to observe these aspects in a
traditional credibility model.
In the context of traditional linear models (homoskedastic and independent), examples
of diagnostic methods may be seen in Hoaglin and Welsch (1978), Belsley et al. (1980)
andCook and Weisberg (1982). Linear mixed models, extensions and generalizations are
briey discussed here and may be seen in Beckman et al. (1987), Christensen and Pearson
(1992), Hilden-Minton (1995), Lesare and Verbeke (1998), Banerjee and Frees (1997),
Tan et al. (2001), Fung et al. (2002), Demidenko (2004), Demidenko and Stukel (2005),
Zewotir and Galpin (2005), Nobre and Singer (2007, 2011) and Gumedze et al. (2010).
4.1 Residual analysis
In the linear mixed models class three dierent kinds of residuals may be considered. The
conditional residuals: e = y X
b, the EBLUP: Z
b, and the marginal residuals:
= y X
. These predict respectively conditional error e = y E[y|b] = y X Zb,

random eects Zb = E[y|b] E[y] and the marginal error = y E[y] = y X. Each
of the mentioned residuals is useful to verify some assumption of the model, as seen in
Nobre and Singer (2007) and briey presented next.
4.1.1 Conditional residuals
To identify cases with a possible high inuence on
2
in linear mixed models, Nobre and
Singer (2007) suggested the standardization for the conditional residual given by
e
i
=
e
i
q
ii
,
where q
ii
represents the ith element in the main diagonal of Q dened as
Q =
2
(V
1
V
1
X(X
V
1
X)
1
X
V
1
).
Under normality assumptions on e, this standardization identies outlier observations and
subjects; see Nobre and Singer (2007). To do so, the same authors consider the quadratic
form M
I
= y
QU
I
(U
I
QU
I
)
1
U
I
Qy, where U
I
= (u
ij
)
(nk)
= (U
i1
, . . . , U
ik
), with U
i
representing the ith column of the identity matrix of order n. To identify an outlier subject
let I be the index set of the subject observations and evaluate M
I
for this subset.
Table 2. Diagnostic techniques involving residuals.
Diagnostic Graph
Linearity of xed eects

vs. explanatory variables (tted values)
Presence of outliers e vs. observation index
Homoskedasticity of the conditional errors e vs. tted values
Normality of the conditional errors QQ plot for the least confounded residuals
Presence of outlier subjects Mahalanobis distance vs. observation index
Normality of the xed eects weighted QQ plot for

b
i
4.1.2 Confounded residuals
It can be shown that, under the assumptions made by model given in Equation (1), we
have
e = RQe +RQZb and Z
b = ZGZ
QZb +ZGZ
Qe,
where R =
2
I
n
. These identities tell us that e and Z
b depend on b and e and thus

are called confounded residuals; see Hilden-Minton (1995). To verify the normality of the
conditional errors using only e may be misleading because of the presence of b in the
above formulas. Hilden-Minton (1995) dened the confounding fraction as the proportion
of variability in e due to the presence of b. The same work suggested the use of a linear
transformation L such that L
e has the least confounding fraction possible. The suggested

transformation also generates uncorrelated homoskedastic residuals. It is more appropri-
ated to analyze the assumption of normality in the conditional errors using L
e instead
of e as suggested by Hilden-Minton (1995) and veried by simulation in Nobre and Singer
(2007).
4.1.3 EBLUP
The EBLUP is useful to identify outlier subjects given that it represents the distance
between the population mean value and the value predicted for the ith subject. A way of
using the EBLUP to search for outliers subjects is to use the Mahalanobis distance (see
Waternaux et al., 1989),
i
=

b
i
(
Var[
b
i
b
i
])
1
b
i
. It is also possible to use the EBLUP
to verify the random eects normality assumption. For more information; see Nobre and
Singer (2007). In Table 2 we summarize diagnostic techniques involving residuals discussed
in Nobre and Singer (2007).
4.2 Sensitivity analysis
Inuence diagnostic techniques are used to detect observations that may produce excessive
inuence in the parameters estimates. There are two main approaches of such techniques:
global inuence, which is usually based on case deletion; and local inuence, which intro-
duces small perturbations in dierent components of the model.
In normal homoskedastic linear regression, examples of sensitivity measures are the
Cook distance, DFFITS and the COVRATIO; see Cook (1977), Belsley et al. (1980) and
Chatterjee and Hadi (1986, 1988).
4.2.1 Global influence
A simple way to verify the inuence of a group of observations in the parameters es-
timates is to remove the group and observe the changes in the estimation. The group
of observations are inuential if the changes are considerably large. However, in LMM,
it may not be practical to reestimate the parameters every time a set of observations is
removed. To avoid doing so, Hilden-Minton (1995) presented an update formula for the
BLUE and BLUP. Let I = {i
1
, . . . , i
k
} be the index set of the removed observations and
U
I
= (U
i1
, . . . , U
ik
). Hilden-Minton (1995) showed that
(I)
= (X
MX)
1
X
MU
I
(I)
and

b
b
(I)
= DZ
QU
I
(I)
,
where the subscript (I) indicates that the estimates were obtained without the observations
indexed by I and

(I)
= (U
I
QU
I
)
1
U
I
Qy.
A suggestion to measure the inuence on the parameters estimates in linear mixed
models is to use the Cook distance (see Cook, 1977) given by
D
I
=
(

(I)
)
(X
V
1
X)
1
(

(I)
)
c
=
(y y
(I)
)
V
1
(y y
(I)
)
c
,
such as seen in Christensen and Pearson (1992) and Banerjee and Frees (1997), where c
is a scale factor. However, it was pointed out by Tan et al. (2001) that D
I
is not always
able to measure the inuence on the estimation properly in the mixed models class. The
same authors suggest the use of a measure similar to the Cook distance, but conditional
to BLUP (
b). The conditional Cook distance is dened for the ith observation as
D
cond
i
=
k
j=1
P
j(i)
Var[y|b]
1
P
j(i)
(n 1)k + p
, i = 1, . . . , k,
where P
j(i)
= y
j
y
j(i)
= (X
j
+Z
j
b
j
)(X
j
(i)
+Z
j
b
j(i)
). The same authors decomposed
D
cond
i
= D
cond
i1
+ D
cond
i2
+ D
cond
i3
and commented the interpretation of each part of the
decomposition. D
cond
i1
is related to the inuence in the xed eects, D
cond
i2
is related to the
inuence on the predicted values and D
cond
i3
to the covariance of the BLUE and the BLUP,
which should be close to zero if the model is valid.
When all the observations from a subject are deleted, it is not possible to obtain the
BLUP for the random eects of that subject, making it impossible to obtain D
cond
I
as
stated above. For this purpose, Nobre (2004) suggested using D
cond
I
= (n
i
)
1
jI
D
cond
j
,
where I indexes the observation from a subject, as a way to measure the inuence of a
subject on the parameters estimates when its observations are deleted.
There are natural extensions of leverage measures for linear mixed models. These can
be seen in Banerjee and Frees (1997), Fung et al. (2002), Demidenko (2004) and Nobre
(2004). However, they only provide information about leverage regarding tted marginal
values. This has two main limitations as commented in Nobre and Singer (2011). First
we may be interested in detecting high-leverage within-subject observations. Second, in
some cases the presence of high-leverage within-subject observations does not imply that
the subject itself is detected as a high-leverage subject. Suggestions of how to evaluate
the within-subject leverage may be seen in Demidenko and Stukel (2005) and Nobre and
Singer (2011).
4.2.2 Local influence
The concept of local inuence was proposed by Cook (1986) and consists in analyzing the
sensitivity of a statistical model when subjected to small perturbations. It is suggested
to use an inuence measure called the likelihood displacement. Considering the model
described in Equation (2), up to a constant, the log-likelihood function may be written as
L() =
k
i=1
L
i
() =
1
2
k
i=1
_
ln |V
i
| + (y
i
X
i
)
V
1
(y
i
X
i
)
_
.
The likelihood displacement is dened as LD() = 2{L(
) L(
)}, where is a l 1
perturbations vector in an open set R
l
; is the parameters vector of the model,
including covariance parameters;

is the ML estimate of and

is the ML estimate when

the model is perturbed. It is necessary to assume that
0
exists such that L(
) = L(
0
)
and such that LD has its rst and second derivatives in a neighborhood of (
0
)
. Cook
(1986) considered a R
l+1
surface formed by the inuence function () = (
, LD(
))
and the normal curvature in the vicinity of

0
in the direction of a vector d, denoted by
C
d
. In this case, the normal curvature is given by
C
d
= 2|d
L
1
Hd|,
where

L =
2
L()/
and H =
2
L(
)/
both evaluated at =

; see Cook
(1986). It can be shown that C
d
always lies between the minimum and maximum eigen-
value of the matrix

F = H
L
1
H, so d
max
, the eigenvector associated to the highest
eigenvalue, gives information about the direction that exhibits more sensitivity of LD()
in a
0
neighborhood. Beckman et al. (1987) made some comments on the eectiveness of
the local inuence approach. Lesare and Verbeke (1998) and Nobre (2004) showed some
examples of perturbation schemes in the linear mixed models context.
Perturbation scheme for the covariance matrix of the conditional errors. To ver-
ify the sensitivity of the model to the conditional homoskedasticity assumption, pertur-
bations are inserted in the covariance matrix of the conditional errors. This can be done
by considering Var[] =
2
(), where () = diag(), with = (
1
, . . . ,
N
)
, the
perturbation vector. For this case we have
0
= 1
N
. The log-likelihood function in this
case is given by
L = L(
) =
1
2
_
ln |V ()| + (y X)
V ()
1
(y X)
_
,
where V
= ZDZ
+
2
().
Perturbation scheme for the response. For the local inuence approach, Beckman et
al. (1987) proposed the perturbation scheme
y() = y + s,
where s represents a scale factor and is a n 1 perturbation vector. For this scheme
we have
0
= 0, with 0 representing the n 1 null vector. In this case, the perturbed
log-likelihood function is proportional to
L(
) =
1
2
(y + s X)
V
1
(y + s X).
Perturbation scheme for the random effects covariance matrix. It is possible to
assess the sensitivity of the model in relation to the random eects homoskedasticity
assumption by perturbing the matrix G. Nobre (2004) suggested the use of Var[b
i
] =
i
G
as a perturbation scheme. In this case is a q 1 vector and
0
= 1
q
. The perturbed
log-likelihood function is proportional to
L() =
1
2
k
i=1
_
ln |V
i
()| + (y
i
X
i
)
1
V ()
1
(y
i
X
i
)
_
.
Perturbation scheme for the weighted case. Verbeke (1995) and Lesare and Verbeke
(1998) suggested perturbing the log-likelihood function as
L(|) =
k
i=1
i
L
i
().
Such a perturbation scheme is appropriate for measuring the inuence of the ith subject
using the normal curvature in its direction and is given by
C
i
= 2|d
i
H
L
1
Hd
i
|,
where d
i
is a vector whose entries are 1 in the ith coordinate and zero everywhere else.
Verbeke (1995) showed that if C
i
has a high value, then the ith subject has great inuence
in the value of

. A threshold of twice the mean value of all C
j
s helps to decide whether
or not the observation is inuential.
Lesare and Verbeke (1998) extracted from C
i
some interpretable measures. They es-
pecially propose using X
i
X
i

2
, R
i
2
Z
i
Z
i

2
, I
n
i
R
i
R
i

2
and
V
1
i

2
, where
X
i
=

V
1/2
X
i
, Z
i
=

V
1/2
i
Z
i
, R
i
=

V
1/2
i
e
i
, to evaluate the inuence of the ith subject
in the model parameter estimates. The actual interpretation of each of these terms can be
seen in the original paper.
4.2.3 Conform local influence
The C
d
measure proposed by Cook (1986) is not invariant to scale re-parametrization.
To obtain a similar standardized measure and make it more comparable, Poon and Poon
(1999) used the conform normal curvature instead of the normal curvature given by
B
d
() =
2|d
L
1
Hd|
2H
L
1
H
.
It can be shown that 0 B
d
() 1 to d direction and that B
d
is invariant to conform scale
re-parametrization. A re-parametrization is said to be conform if its jacobian J is such that
J
J = tI
s
, to some real t and integer s. They showed that if
1
, . . . ,
l
are the eigenval-
ues of

F matrix with v
1
, . . . , v
l
representing the respective normalized eigenvectors, then
the value of the conform normal curvature in v
i
direction is equal to
i
/
_
l
i=1

2
i
and
l
i=1
B
2
vi
() = 1. If every eigenvector has the same conform normal curvature, its value is
equal to 1/
l. Poon and Poon (1999) proposed to use this measure as a referential to mea-
sure the intensity of the local inuence of an eigenvector. It can also be shown that when
d has the direction of d
max
the conform normal curvature also attains its maximum. In
this way, the normal curvature and the conform normal curvature are equivalent methods.
5. Application
According to Frees et al. (1999), the random coecient models are equivalent to the
Hachemeister linear regression model which is used for the example data in Hachemeister
(1975). The random coecient model to the data in Table 1 may be described as
y
ij
=
i
+ j
i
+ e
ij
, i = 1, . . . , 5, j = 1, . . . , 12,
where y
ij
represents the average claim amount for state i in the jth trimester,
i
= +a
i
and
i
= + b
i
, with xed and , and (a
i
, b
i
)
N
2
(0, D), in which D is a 2 2
covariance matrix. In order to nd a possible simpler model, we used R to apply the
asymptotic likelihood ratio test described in Giampaoli and Singer (2009) to compare the
suggested random coecients model and a random intercept model. The p-value obtained
from the test was 0.0514. It indicates that it may be enough to consider the random
eect for the intercept only. This decision is also supported by the Bayesian information
criterion (BIC), which is equal to 808.3 for the single random eect model and 811.6 for the
model with two random eects. We may also use another set of tests, involving bootstrap,
monte-carlo and permutational methods, to investigate whether or not should we prefer
the random intercept model. These tests may be seen in Crainiceanu and Ruppert (2004),
Greven et al. (2008) and Fitzmaurice et al. (2007). However, this is very distant from our
goals and is not discussed here. For the sake of simplicity and based on the presented
reasons we shall use the random intercept model, which diers a little from the model
proposed by Frees et al. (1999). Thus, the model to be adjusted for the data in this
example is
y
ij
=
i
+ j +
ij
, i = 1, . . . , 5, j = 1, . . . , 12, (4)
where
i
= +a
i
, and are the same as dened before. Assume also that Var[
ij
] =
2
and Var[a
i
] =
2
a
.
The model parameter estimates were obtained by the RML method using the lmer()
function from the lme4 package in R. The standard errors were obtained from SAS c (SAS
Institute Inc., 2004) using the proc MIXED. The estimates are shown in Table 3.
Table 3. Model parameter estimates.
Parameter
2

2
a
Estimate 1460.32 32.41 32981.53 73398.25
SE 131.07 6.79 6347.17 24088.00
Figure 3 shows the ve conditional regression lines obtained from the linear mixed model
given in Equation (4). The adjusted model clearly suggests that the claim amount is higher
in state 1. Also it suggests a similarity in the claim amounts from states 2 and 4. Besides
that, we can expect a smaller risk from policies in state 5, since they are much closer to
the respective adjusted conditional line. Further information is explored by the diagnostic
analysis commented next.
0 10 20 30 40 50 60
1
0
0
0
1
5
0
0
2
0
0
0
2
5
0
0
Conditional regression lines
Observation
A
g
g
r
e
g
a
t
e

c
l
a
i
m

a
m
o
u
n
t
State 1
State 2
State 3
State 4
State 5
Figure 3. Conditional regression lines.
5.1 Diagnostic analysis
The standardized residuals proposed by Nobre and Singer (2007) suggest that observation
4.7 (obtained from state 4 in the seventh trimester) may be considered an outlier as shown
in Figure 4(a). According to the QQ plot in Figure 4(b) it is reasonable to assume that the
conditional errors are normally distributed. The Mahalanobis distance in Figure 4(c) was
normalized to t the interval [0, 1] and suggests that the rst state may be an outlier. The
measure M
I
proposed by Nobre and Singer (2007) in Figure 4(d), also normalized, suggests
that none of the states have outlier observations. The Mahalonobis distance should not
be confounded with M
I
. The rst is based on the EBLUP and the last is based on the
conditional errors, and thus they have dierent meanings. For both analyses, an observation
is highlighted if it is greater than twice the mean of the measures.
1 2 3 4 5
1
0
1
2
3
(a)
State
S
t
a
n
d
a
r
d
i
z
e
d

c
o
n
d
i
t
i
o
n
a
l

r
e
s
i
d
u
a
l
4.7
2 1 0 1 2
1
0
1
2
3
(b)
Quantiles of N(0,1)
L
e
a
s
t

c
o
n
f
o
u
n
d
e
d

r
e
s
i
d
u
a
l

s
t
a
n
d
a
r
d
i
z
e
d
1 2 3 4 5
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
(c)
State
M
a
h
a
l
a
n
o
b
i
s

d
i
s
t
a
n
c
e
1
1 2 3 4 5
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
(d)
State
M
I
Figure 4. Residual analysis: (a) standardized residuals, (b) least confounding residuals, (c) EBLUP, (d) values for
M
I
.
The conditional Cook distance is shown in Figure 5. The distances were normalized for
comparison. Figure 5(a) suggests that observation 4.7 is inuential in the model estimates.
The rst term of the distance decomposition suggests that no observations were inuential
in the estimate of as shown in Figure 5(b). The second term of the decomposition
suggests that observation 4.7 is potentially inuential in the prediction of b as seen on
Figure 5(c). The last term, D
i3
, is as close to zero as expected and is omitted.
1 2 3 4 5
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
(a)
State
C
o
o
k
s

c
o
n
d
i
t
i
o
n
a
l

d
i
s
t
a
n
c
e
4.7
1 2 3 4 5
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
(b)
State
D
1
1 2 3 4 5
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
(c)
State
D
2
4.7
Figure 5. (a) Conditional Cook distance, (b) D
i1
, (c) D
i2
.
Figure 6 shows the local inuence analysis using three dierent perturbation schemes.
The rst, in Figure 6(a), is related to the conditional errors covariance matrix, as suggested
in Beckman et al. (1987), and indicates that the observations from the fourth state, espe-
cially 4.7, are possibly inuential in the homoskedasticity and independence assumption
for the conditional errors. Notice that it is possible to explain the inuence of observation
4.7 analyzing Figure 2. This observation has a value considerably higher than the others
from the same state. Figure 6(b) demonstrates the perturbation scheme for the covariance
matrix associated to the random eects as shown in Nobre (2004). Alternative perturba-
tion schemes for this case can be seen at Beckman et al. (1987). These schemes suggest that
all states are equally inuential in the random eects covariance matrix estimate. Finally,
there are evidences that the observations in the fourth state may not be well predicted by
the model; see Figure 6(c).
After the diagnostic we proceed to a conrmatory analysis by removing the observations
from states 1 and 4, one at a time and then both at the same time. The new estimates are
shown in Table 4. For each parameter, we calculate the relative change in the estimated
values, dened for parameter , as
RC() =
(i)
100%.
1 2 3 4 5
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
(a)
State
A
b
s
o
l
u
t
e

v
a
l
u
e
s

o
f

d
m
a
x

c
o
m
p
o
n
e
n
t
s
4.7
4.1
4.11
4.12
1.2
1 2 3 4 5
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
(b)
State
A
b
s
o
l
u
t
e

v
a
l
u
e

o
f

d
m
a
x

c
o
m
p
o
n
e
n
t
s
1 2 3 4 5
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
(c)
State
|
|
I
R
R
T
|
|
2
4
Figure 6. Perturbation schemes: (a) conditional covariance matrix, (b) random eects covariance matrix, (c) values
for I
n
i
R
i
R
i

2
.
Table 4. Estimates and relative changes for the model given in Equation (4) parameter estimates with and without
states 1 and 4.
Situation
2

2
a
Complete data 1460.32 32.41 32981.53 73398.25
Without State 1 1408.63 (3.67%) 25.26 (22.06%) 34666.31 ( 5.11%) 34335.64 (53.22%)
Without State 4 1530.94 (4.61%) 33.50 ( 3.36%) 24940.12 (24.38%) 59214.50 ( 19.32%)
Without States 1 and 4 1485.56 (1.70%) 24.32 (24.96%) 24497.48 (25.72%) 23707.07 (67.70%)
If all ve states were equally inuential, we would expect the value for RC to lie around
1/5 = 20% after removing a state. If RC() exceeds two times this value, that is 40%,
for some parameter we consider the state was potentially inuential. It is possible to
conclude that three observations from state 1 were inuential in the within-subject variance
estimate. From Figure 2, one can explain this inuence noticing that all the observations
from state 1 had higher values compared to the other states. Notice that such inuence was
not detected in Figure 5(b), but was pointed out by the Mahalanobis distance in Figure
4(c). Removing state 1 from our analysis and running every diagnostic procedure again
we detect no excessive inuence and the only issue is the observation 4.7, which is still an
outlier. From this result the model is validated and it is assumed to be robust and ready
for use.
6. Conclusions
The use of linear mixed models in actuarial science should be encouraged given their
capability to model the within-subject correlation, their exibility and the presence of
diagnostic tools. Insurers should not use a model without validating it rst. For the specic
example seen here, the decision makers may consider a dierent approach for state 1.
After removing observations from state 1 there was a relative change of more than 50% in
the random eect variance estimate, which reects signicantly in the premium estimate.
Such analysis would not be possible in the traditional credibility models approach. This
illustrates how the model can be used to identify dierent sources of risk and can be used
in portfolio management. Linear mixed models are also usually easier to understand and
to present, when compared to standard actuarial methods, such as the credibility models
and Bayesian approach for determining the fair premium. The natural extension of this
work is to repeat the estimation and diagnostic procedures, adapting what is necessary,
to the generalized linear mixed models, which are also useful to actuarial science. Some
works have already been made in this area; see, e.g., Antonio and Beirlant (2006). It is also
interesting to continue a further analysis of the example in Hachemeister (1975), using the
diagnostic procedures again when weights are introduced to the covariance matrix of the
conditional residuals in the random coecient models, and to evaluate the robustness of
the linear mixed models equivalent to the other classic credibility models. Again, this care
is justied because the fairest premium is more competitive in the market.
Appendix
We present here expressions for matrix H and the derivatives seen in the dierent pertur-
bation schemes presented in Section 4.2.2. These calculations are taken from Nobre (2004)
and are presented here to make this text more self-content.
Appendix A. Perturbation Scheme for the Covariance Matrix of the
Conditional Errors
Let H
(k)
be the kth column of H and f be the number of distinct components of matrix
D, then
H
(k)
=
_
_
2
L()
,

2
L()
2
,

2
L()
1
, . . . ,

2
L()
f
_
,
where
2
L()
;=
0
= X
D
k
e,
2
L()
;=
0
=
1
2
_

2
tr
_
D
k
Z
DZ
_
2 e
D
k
V
1
e +
2
e
D
k
e
_
,
2
L()
;=
0
=
1
2
_
tr
_
D
k
Z

D
i
Z
_
2 e
D
k
Z

D
i
Z
e
_
,
with
D
k
=
V
1
()
;=0
=
2
V
(k)
(
V
(k)
)
, e

D
i
=
D
;=0
,
and V
(k)
representing the kth column of V
1
.
Appendix B. Perturbation Scheme for the Response
It can be shown that
2
L()
;=0
= s
V
1
X,
2
L()
;=
0
= s
V
1
V
1
e,
2
L()
;=
0
= sV
1
Z

D
i
Z
V
1
e,
implying
H
= sV
1
_
X,

V
1
e, Z

D
1
Z
V
1
e, . . . , Z

D
f
Z
V
1
e
_
.
Appendix C. Perturbation Scheme for the Random Effects Covariance
Matrix
For this scheme we have
H
(k)
=
_
_
2
L()
,

2
L()
2
,

2
L()
1
, . . . ,

2
L()
f
_
.
It can be shown that
2
L(
k
)
;=
0
= X
k

V
1
k
Z
k
GZ
k

V
1
k
e
k
,
2
L(
k
)
;=0
= tr
_
V
1
k
Z
k
G
k
Z
k
_
2 e
k

V
1
k
Z
k
GZ
k

V
1
k

V
1
k
e
k
,
2
L(
k
)
;=0
= tr
_
V
1
k
Z
k
G
k
Z
k

V
1
k
Z
k

G
i
Z
k
_
e
k

V
1
k
Z
k
GZ
k

V
1
k
Z
k

G
i
Z
k

V
1
k
e
k
.
Acknowledgements
We are grateful to Conselho Nacional de Desenvolvimento Cientco e Tecnologico (CNPq
project # 564334/2008-1) and Fundacao Cearense de Apoio ao Desenvolvimento Cientco
e Tecnologico (FUNCAP), Brazil, for partial nancial support. We also thank an anonym
referee and the executive editor for their careful and constructive review.
References
Antonio, K., Beirlant, J., 2006. Actuarial statistics with generalized linear mixed models.
Insurance: Mathematics and Economics, 75, 643676.
Banerjee, M., Frees, E.W., 1997. Inuence diagnostics for linear longitudinal models. Jour-
nal of the American Statistical Association, 92, 9991005.
Beckman, R.J., Nachtsheim, C.J., Cook, R.D., 1987. Diagnostics for mixed-model analysis
of variance. Technometrics, 29, 413426.
Belsley, D.A., Kuh, E., Welsch, R.E., 1980. Regression Diagnostics: Identifying Inuential
Data and Sources of Collinearity. John Wiley & Sons, New York.
B uhlmann, H., 1967. Experience rating and credibility. ASTIN Bulletin, 4, 199207.
Chatterjee, S., Hadi, A.S., 1986. Inuential observations, high leverage points, and outliers
in linear regression (with discussion). Statistical Science, 1, 379393.
Chatterjee, S., Hadi, A.S., 1988. Sensitivity Analysis in Linear Regression. John Wiley &
Sons, New York.
Christensen, R., Pearson, L.M., 1992. Case-deletion diagnostics for mixed models. Tech-
nometrics, 34, 3845.
Cook, R.D., 1977. Detection of inuential observation in linear regression. Technometrics,
19, 1528.
Cook, R.D., 1986. Assessment of local inuence (with discussion). Journal of The Royal
Statistical Society Series B - Statistical Methodology, 48, 117131.
Cook, R.D., Weisberg, S., 1982. Residuals and Inuence in Regression. Chapman and Hall,
London.
Crainiceanu, C.M., Ruppert, D., 2004. Likelihood ratio tests in linear mixed models with
one variance component. Journal of The Royal Statistical Society Series B - Statistical
Methodology, 66, 165185.
Dannenburg, D.R., Kaas, R., Goovaerts, M.J., 1996. Practical actuarial credibility models.
Institute of Actuarial Science and Economics, University of Amsterdam, Amsterdam.
Demidenko, E., 2004. Mixed Models - Theory and Applications. Wiley, New York.
Demidenko, E., Stukel, T.A., 2005. Inuence analysis for linear mixed-eects models,
Statistics in Medicine, 24, 893909.
Diggle, P.J., Heagerty, P., Liang, K.Y., Zeger, S.L., 2002. Analysis of Longitudinal Data.
Oxford Statistical Science Series.
Drapper, N.R., Smith, N., 1998. Applied Regression Analysis. Wiley, New York.
Dutang, C., Goulet, V., Pigeon, M., 2008. Actuar: an R package for actuarial science.
Journal of Statistical Software, 25, 137.
Fitzmaurice, G.M., Lipsitz, S.R., Ibrahim, J.G., 2007. A note on permutation tests for
variance components in multilevel generalized linear mixed models. Biometrics, 63, 942
946.
Frees, E.W., Young, V.R., Luo, Y., 1999. A longitudinal data analysis interpretation of
credibility models. Insurance: Mathematics and Economics, 24, 229247.
Fung, W.K., Zhu, Z.Y., Wei, B.C., He, X., 2002. Inuence diagnostics and outliers tests
for semiparametric mixed models. Journal of The Royal Statistical Society Series B -
Statistical Methodology, 64, 565579.
Giampaoli, V., Singer, J., 2009. Restricted likelihood ratio testing for zero variance com-
ponents in linear mixed models. Journal of Statistical Planning and Inference, 139,
14351448.
Greven, S., Crainiceanu, C.M., Kuchenho, H., Peters, A., 2008. Likelihood ratio tests for
variance components in linear mixed models. Journal of Computational and Graphical
Statistics, 17, 870891.
Gumedze, F.N., Welham, S.J., Gogel, B.J., Thompson, R., 2010. A variance shift model
for detection of outliers in the linear mixed model. Computational Statistics and Data
Analysis, 54, 21282144.
Hachemeister, C.A., 1975. Credibility for regression models with application to trend.
Proceedings of the Berkeley Actuarial Research Conference on Credibility, pp. 129163.
Hilden-Minton, J.A., 1995. Multilevel diagnostics for mixed and hierarchical linear models.
Ph.D. Thesis. University of California, Los Angeles.
Hoaglin, D.C., Welsch, R.E., 1978. The hat matrix in regression and ANOVA. American
Statistical Association, 32, 1722.
Johnson, R.A., Whichern, D.W., 1982. Applied Multivariate Stastical Analysis. Sixth edi-
tion. Prentice Hall. pp. 273332.
Lesare, E., Verbeke, G., 1998. Local inuence in linear mixed models. Biometrics, 54,
570582.
Liang, K.Y., Zeger, S.L., 1986. Longitudinal analysis using generalized linear models.
Biometrika, 73, 1322.
McCulloch, C.E, Searle, S.R., 2001. Generalized, Linear, and Mixed Models. Wiley, New
York.
Nobre, J.S., 2004. Mtodos de diagnstico para modelos lineares mistos. Unpublished Master
Thesis (in portuguese). IME/USP, Sao Paulo.
Nobre, J.S., Singer, J.M., 2007. Residuals analysis for linear mixed models. Biometrical
Journal, 49, 863875.
Nobre, J.S., Singer, J.M., 2011. Leverage analysis for linear mixed models. Journal of
Applied Statistics, 38, 10631072.
Patterson, H.D., Thompson, R., 1971. Recovery of interblock information when block sizes
are unequal. Biometrika, 58, 545554.
Poon, W.Y., Poon, Y.S., 1999. Conformal normal curvature and assessment of local in-
uence. Journal of The Royal Statistical Society Series B - Statistical Methodology, 61,
5161.
R Development Core Team, 2009. R: A Language and Environment for Statistical Com-
puting. R Foundation for Statistical Computing, Vienna, Austria.
SAS Institute Inc., 2004. SAS 9.1.3 Help and Documentation. SAS Institute Inc., Cary,
North Carolina.
Tan, F.E.S., Ouwens, M.J.N., Berger, M.P.F., 2001. Detection of inuential observation in
longitudinal mixed eects regression models. The Statistician, 50, 271284.
Verbeke, G., 1995. The linear mixed model. A critical investigation in the context of longi-
tudinal data analysis. Ph.D. Thesis. Catholic University of Leuven, Faculty of Science,
Department of Mathematics, Leuven, Belgium.
Verbeke, G., Molenberghs, G., 2000. Linear Mixed Models for Longitudinal Data. Springer.
Vonesh, E.F., Chinchilli, V.M., 1997. Linear and Nonlinear Models for the Analysis of
Repeated Measurements. Marcel Dekker, New York.
Waternaux, C., Laird, N.M., Ware, J.H., 1989. Methods for analysis of longitudinal data:
blood-lead concentrations and cognitive development. Journal of the American Statisti-
cal Association, 84, 3341.
Zewotir, T., Galpin, J.S., 2005. Inuence diagnostics for linear mixed models. Journal of
Data Science, 3, 53177.
Vol. 3, No. 1, April 2012, 7591
Statistical Modeling
Research Paper
Real estate appraisal of land lots using
GAMLSS models
Lutemberg Florencio
1
, Francisco Cribari-Neto
2
and Raydonal Ospina
2,
1
Banco do Nordeste do Brasil S.A, Boa Vista, Recife/PE, 50060004, Brazil
2
Departamento de Estatstica, Universidade Federal de Pernambuco, Recife/PE, Brazil
(Received: 17 May 2011 Accepted in nal form: 24 September 2011)
Abstract
The valuation of real estates is of extreme importance for decision making. Their sin-
gular characteristics make valuation through hedonic pricing methods dicult since the
theory does not specify the correct regression functional form nor which explanatory
variables should be included in the hedonic equation. In this article, we perform real
estate appraisal using a class of regression models proposed by Rigby and Stasinopoulos
(2005) called generalized additive models for location, scale and shape (GAMLSS). Our
empirical analysis shows that these models seem to be more appropriate for estimation
of the hedonic prices function than the regression models currently used to that end.
Keywords: Cubic splines GAMLSS regression models Hedonic prices function
Nonparametric smoothing Semiparametric models.
Mathematics Subject Classication: Primary 62P25 Secondary 62P20 62P05.
1. Introduction
The real estate, apart from being a consumer good that provides comfort and social status,
is one of the economic pillars of all modern societies. It has become a form of stock capital,
given the expectations of increasing prices, and a means of obtaining nancial gains through
rental revenues and sale prots. As a consequence, the real estate market value has become
a parameter of extreme importance.
The estimation of a real estate value is usually done using a hedonic pricing equation
according to the methodology proposed by Rosen (1974). This is seen as a heterogeneous
good comprised of a set of characteristics and it is then important to estimate an explicit
function, called hedonic price function, that determines which are the most inuential
attributes, or attribute package, when it comes to determining its price. However, the
estimation of a hedonic equation is not a trivial task since the theory does not determine
the exact functional form nor the relevant conditioning variables.
Corresponding author. Raydonal Ospina. Departamento de Estatstica, Universidade Federal de Pernambuco,

Cidade Universitria, Recife/PE, 50740540, Brazil. Email: raydonal@de.ufpe.br
76 L. Florencio, F. Cribari-Neto and R. Ospina
The use of classical regression methodologies, such as the classical normal linear regres-
sion model (CNLRM), for real estate appraisal can lead to biased, inecient and/or incon-
sistent estimates given the inherent characteristics of the data (for example, nonnormality,
heteroskedasticity and spatial correlation). The use of generalized linear models (GLM) is
also subject to shortcomings, since the data may come from a distribution outside the expo-
nential family and the functional relationship between the response and some conditioning
variables may not be the same for all observations. There are semiparametric and nonpara-
metric hedonic price estimations available in the literature, such as Pace (1993), Anglin
and Gencay (1996), Gencay and Yang (1996), Thorsnes and McMillen (1998), Iwata et al.
(2000) and Clapp et al. (2002). We also highlight the work of Martins-Filho and Bin (2005),
who modeled data from the real estate market in Multnomah County (Oregon-USA) non-
parametrically. We note that the use of nonparametric estimation strategies require very
large datasets in order to avoid the curse of dimensionality. Overall, however, most he-
donic price estimations are based on traditional methodologies such as the classical linear
regression model and the class of generalized linear models.
This article proposes a methodology for real estate mass appraisal
1
based on the class of
GAMLSS models. The superiority GAMLSS modeling relative to traditional methodologies
is evidenced by an empirical analysis that employs data on urban land lots located in
the city of Aracaju, Brazil. We perform a real estate evaluation in which the response
variable is the unit price of land lots and the independent variables are reect the land lots
structural, location and economic characteristics. We estimate the location and scale eects
semiparametrically in such a way that some covariates (the geographical coordinates of the
land lot, for instance) enter the predictor nonparametrically and their eects are estimated
using smoothing splines (see Silverman, 1984; Eubank, 1999) whereas other regressors are
included in the predictor in the usual parametric fashion. The model delivers a t that is
clearly superior to those obtained using the usual approaches. In particular, we note that
our t yields a very high pseudo-R
2
.
The paper unfolds as follows. In Section 2, we briey present the class of GAMLSS models
and highlight its main advantages. In Section 3, we describe the data used in the empirical
analysis. In Section 4, we present and discuss the empirical results. Finally, Section 5 closes
the paper with some concluding remarks.
2. GAMLSS Modeling
Rigby and Stasinopoulos (2005) introduced a general class of statistical models called addi-
tive models for location, scale and shape (GAMLSS). It encompasses both parametric and
semiparametric models, and includes a wide range of continuous and discrete distributions
for the response variable. It also allows the simultaneous modeling of several parameters
that index the response distribution using parametric and/or nonparametric functions.
With GAMLSS models, the distribution of the response variable is not restricted to the
exponential family and dierent additive terms can be included in the regression predictors
for the parameters that index the distribution, like smoothing splines and random eects,
which yields extra exibility to the model. The model is parametric in the sense that the
specication of a distribution for the response variable is required and at the same time it
is semiparametric because one can model some conditioning eects through nonparametric
functions.
The probability density function of the response variable y is denoted as f (y|), where
= (
1
, . . . ,
p
)
is a p-dimensional parameter vector. It is assumed to belong to a wide

1
Evaluation of a set of real properties through methodology and procedures common to all of them.
class of distributions that we denote by D. This class of distributions includes continuous
and discrete distributions as well as truncated, censored and nite mixtures of distributions.
In the GAMLSS regression framework the p parameters that index f(y|) are modeled using
additive terms.
Let y = (y
1
, . . . , y
n
)
be the vector of independent observations on the response

variable, each y
i
having probability density function f (y
i
|
i
), for i = 1, . . . , n. Here,
i
= (
i1
, . . . ,
ip
)
is a vector of p parameters associated with the explanatory variables

and with random eects. When the covariates are stochastic, f(y
i
|
i
) is taken to be con-
ditional on their values. Additionally, for k = 1, . . . , p, g
k
() is a strictly monotonic link
function that relates the kth parameter
k
to explanatory variables and random eects
through the additive predictor
g
k
(
k
) =
k
= X
k
k
+
Jk
j=1
Z
jk
jk
, (1)
where
k
and
k
are n1 vectors,
k
= (
1k
, . . . ,
J
k
k
)
is a vector of parameters of length

J
k
and X
k
and Z
jk
are xed (covariate) design matrixes of orders n J
k
and n q
jk
,
respectively. Finally,
jk
is a q
jk
-dimensional random variable. Model given in Equation
(1) is called GAMLSS; see Rigby and Stasinopoulos (2005).
In many practical situations it suces to model four parameters (p = 4), usually location
(
1
= ), scale (
2
= ), skewness (
3
= ) and kurtosis (
4
= ); the latter two are said
to be shape parameters. Thus, we have the models
Location and scale
parameters
g
1
() =
1
= X
1
1
+
J1
j=1
Z
j1
j1
;
g
2
() =
2
= X
2
2
+
J2
j=1
Z
j2
j2
;
Shape parameters
g
3
() =
3
= X
3
3
+
J3
j=1
Z
j3
j3
;
g
4
() =
4
= X
4
4
+
J4
j=1
Z
j4
j4
.
It is also possible to add to the predictor functions h
jk
that involve smoothers like cubic
splines, penalized splines, fractional polynomials, loess curves, terms of variable coecients,
and others. Any combination of these functions can be included in the submodels for , ,
and . As Akantziliotou et al. (2002) pointed out, the GAMLSS framework can be applied
to the parameters of any population distribution and generalized to allow the modeling of
more than four parameters.
GAMLSS models can be estimated using the gamlss package for R (see Ihaka and
Gentleman, 1996; Cribari-Neto and Zarkos, 1999), which is a free software; see http:
//www.R-project.org. Practitioners can then choose from more than 50 response distri-
butions.
2.1 Estimation
Two aspects are central to the GAMLSS additive components tting, namely: the backt-
ting algorithm and the fact that quadratic penalties in the likelihood function follow from
the assumption that all random eects in the linear predictor are normally distributed.
Suppose that the random eects
jk
in model given in Equation (1) are independent and
normally distributed with
jk
N
qjk
(0, G
1
jk
), where G
1
jk
is the q
jk
q
jk
(generalized)
inverse of the symmetric matrix G
jk
= G
jk
(
jk
). Rigby and Stasinopoulos (2005) noted
that for xed values of
jk
, one can estimate
k
and
jk
by maximizing the penalized
log-likelihood function
p
=
1
2
p
k=1
Jk
j=1
jk
G
jk
jk
,
where =

n
i=1
log{f(y
i
|
i
)} is the log-likelihood function of the data given
i
, for i =
1, . . . , n. This can be accomplished by using a backtting algorithm; for details, see Rigby
and Stasinopoulos (2005, 2007), Hastie and Tibshirani (1990) and Hrdle et al. (2004).
2.2 Model selection and diagnostic
GAMLSS model selection is performed by comparing various competing models in which
dierent combinations of the components M = {D, G, T , } are used, where D species
the distribution of the response variable, G is the set of link functions (g
1
, . . . , g
p
) for the
parameters (
1
, . . . ,
p
), T denes the set of predictor terms (t
1
, . . . , t
p
) for the predictors
(
1
, . . . ,
p
) and species the set of hiperparameters.
In the parametric GAMLSS regression setting, each nested model M can be assessed
from its tted global deviance (GD), given by GD = 2(
), where (
) =

n
i=1
(

i
).
Two nested competing GAMLSS models M
0
and M
1
, with tted global deviances GD
0
and GD
1
and error degrees of freedom (df), namely df
e0
and df
e1
, respectively, can be
compared using the generalized likelihood ratio test statistic = GD
0
GD
1
, which is
asymptotically distributed as
2
with d = df
e0
df
e1
df under M
0
. For each model M the
number of error df, namely df
e
is df
e
= n
p
k=1
df
k
, where df
k
are the df used in the
predictor of the model for the parameter
k
, for k = 1, . . . , p.
When comparing nonnested GAMLSS models (including models with smoothing terms),
the generalized Akaike information criterion (GAIC) (see Akaike, 1983) can be used to
penalize overttings. This is achieved by adding to the tted global deviances a xed
penalty # for each eective df that are used in the model, that is, GAIC(#) = GD+#df,
where GD is the tted global deviance. One then selects the model with the smallest
GAIC(#) value.
To assess the overall adequacy of the tted model, we propose the randomized quantile
residual; see Dunn (1996). This is a randomized version of the residual proposed by Cox
and Snell (1989) dened as
r
q
i
=
1
(u
i
), i = 1, . . . , n,
where () denotes the standard normal distribution function, u
i
is a uniform random
variable on the interval (a
i
, b
i
], with a
i
= lim
yyi
F(y
i
|
i
) and b
i
= F(y
i
|
i
). A plot of these
residuals against the index of the observations (i) should show no detectable pattern. A
detectable trend in the plot of some residual against the predictors may be suggestive of
link function misspecication.
Also, normal probability plots with simulated envelopes (see Atkinson, 1985) or worm
plots (see Buuren and Fredriks, 2001) are a helpful diagnostic tool. The worm plots are
useful for analyzing the residuals in dierent regions (intervals) of the explanatory variable.
If no explanatory variable is specied, the worm plot becomes a detrended normal QQ plot
of the (normalized quantile) residuals. When all points lie inside the (dotted) condence
bands (the two elliptical curves) there is no evidence of model misspecication.
In the context of a fully parametric model GAMLSS we can use pseudo R
2
measures.
For example, R
2
p
= 1 log
L/ log
L
0
(see McFadden, 1974) and R
2
LR
= 1 (
L
0
/
L)
2/n
(see
Cox and Snell, 1989, pp. 208209), where

L
0
and

L are the maximized likelihood functions
of the null (intercept only) and tted (unrestricted) models, respectively. The ratio of the
likelihoods or log-likelihoods may be regarded as a measure of the improvement, over the
model with
i
parameters achieved by the model under investigation.
Our proposal, however, is to compare the dierent models using the pseudo-R
2
given
by the square of the sample correlation coecient between the response and the tted
values. Notice that by doing so we can consider both fully parametric models and models
that include nonparametric components. We can also compare the explanatory power of a
GAMLSS model to those of GLM and CNLRM models. This is the pseudo-R
2
we use, which
was introduced by Ferrari and Cribari-Neto (2004) in the context of beta regressions and
it is a straightforward generalization of the R
2
measure used in linear regression analysis.
3. Data Description
The data contain 2,109 observations on empty urban land lots located in the city of Aracaju,
capital of the state of Sergipe (SE), Brazil, and comes from two sources: (i) data collected
by the authors from real estate agencies, advertisements on newspapers and research on
location (land lots for sale or already sold); (ii) data obtained from the Departamento de
Cadastro Imobilirio da Prefeitura de Aracaju. Observations cover the years 2005, 2006
and 2007. Each land lot price was recorded only once during that period. It is also note-
worthy that the land lots in the sample are geographically referenced relative to the South
American Datum
1
and have their geographical positions (latitude, longitude) projected
onto the Universal Transverse Mercator (UTM) coordinate system
2
.
The sample used to estimate the hedonic prices equation (i.e., the equation of hedonic
prices of urban land lots in Aracaju-SE) contains, besides the year of reference, information
on the physical (area, front, topography, pavement and block position), location (neighbor-
hood, geographical coordinates, utilization coecient and type of street in which the land
lot is located) and economic (nature of the information that generated the observation,
average income of the head of household of the censitary system, where the land is located
and the land lot price) characteristics of the land lots. In particular, we use the variables
YEAR (YR): qualitative ordinal variable that identies the year in which the information
was obtained. It assumes the values 2005, 2006 (YR06) and 2007 (YR07). It enters the
model through dummy variables;
AREA (AR): continuous quantitative variable, measured in m
2
(square meters), relative to
the projection on a horizontal plane of the land surface;
FRONT (FR): continuous quantitative variable, measured in m (meters), concerning the
projection of the land lot front over a line which is perpendicular to one of the lot
boundaries, when both are oblique in the same sense, or to the chord, in the case of
curved fronts;
TOPOGRAPHY (TO): nominal qualitative variable that relates to the topographical confor-
mations of the land lot. It is classied as plain when the land acclivity is smaller than
10% or its declivity is smaller than 5%, and as rough otherwise. It is a dummy variable
that equals 1 for plain and 0 rough;
1
The South American Datum (SAD) is the regional geodesic system for South America and refers to the mathematical
representation of the Earth surface at sea level.
2
Cilindrical cartographic projection of the terrestrial spheroid in 60 secant cylinders at Earth level alongside the
meridians in multiple zones of 6 degrees longitude and stretching out 80 degrees South latitude to 84 degrees North
latitude.
PAVEMENT (PA): nominal qualitative variable that indicates the presence or absence of
pavement (concrete, asphalt, among others) on the street in which the main land lot
front is located. It enters the model as a dummy variable that equals 1 when the land
lot is located on a paved street and 0 otherwise;
SITUATION (SI): nominal qualitative variable used to dierentiate the disposition of the
land lot on the block. It is classied as corner lot or middle lot. It is a dummy variable
that assumes value 1 for corner lots and 0 for all other land lots;
NEIGHBORHOOD (NB): nominal qualitative variable referring to the name of the neigh-
borhood where the land lot is located. It was categorized as valuable (highly priced)
neighborhoods and other neighborhoods, with the variable shown as VN and regarded
as a dummy (1 for valuable neighborhoods). The neighborhoods were also grouped as
belonging or not belonging to the city South Zone, dummy denoted by SZ (1 for South
Zone);
LATITUDE (LAT) and LONGITUDE (LON): continuous quantitative variables corresponding to
the geographical position of the land lot at the point z = (LAT, LON), where LAT and LON
are the coordinates measured in UTM;
UTILIZATION COEFFICIENT (UC): discrete variable given by a number that, when multiplied
by the area of the land lot, yields the maximal area (in square meters) available for
construction. UC is dened in an ocial urban development document. It assumes the
values 3.0, 3.5, . . . , 5.5, 6.0;
STREET (STR): ordinal qualitative variable used to dierentiate the land lot location rel-
ative to streets and avenues. It is classied as minor arterial (STR1), collector street
(STR2) and local street according to the importance of the street where the land lot
is located. It enters the model as dummy variables;
NATURE OF THE INFORMATION (NI): nominal qualitative variable that indicates whether
the observation is derived from oer, transaction or from the Aracaju register oce
(real state sale taxes). It enters the model through dummy variables;
SECTOR (ST): discrete quantitative proxy variable of macrolocation used to socioeconom-
ically distinguish the various neighborhoods, represented by the average income of the
head of household, in minimum wages, according to the IBGE census (2000). The neigh-
borhood average income functions as a proxy to other characteristics, such as urban
amenities. It assumes the values: 1, . . . , 18;
FRONT IN HIGHLY VALUED NEIGHBORHOODS (FRVN): continuous quantitative variable that
assumes strictly positive values and corresponds to the interaction between FR and VN
variables. It is included in the model to capture the inuence of land lots front dimensions
in valuable neighborhoods;
UNIT PRICE (UP): continuous quantitative variable that assumes strictly positive values
and corresponds to the land lot price divided by its area, measured in R$/m
2
(reais per
square meter).
In real estate appraisals (and specically in land lots valuations), the interest typically lies
in modeling the unit price as a function of the underlying structural, location and economic
characteristics of the real estate. We then use UP as the dependent variable (response). The
independent variables relate to the location (NB, VN, SZ, LAT, LON, ST, UC and STR), physical
(AR, FR, TO, SI and FRVN) and economic (NI) land lot characteristics; we also account for
the year of data collection.
Figure 1 presents box-plots of UP, AR and FR and Table 1 displays summary statistics
on those variables. The box-plot of UP shows that its distribution is skewed and that there
are several extreme observations. Notice from Table 1 that the sample values of UP range
from R$ 2.36/m
2
to R$ 800.00/m
2
and that 75% of the land lots have unit prices smaller
than R$ 82.82/m
2
.
We note that 263 extreme observations have been identied from the box-plot of AR (see
Figure 1). These observations are not in error, they appear as outlying data points in the
plot because the variable assumes a quite wide range of values: from 41 m
2
to 91, 780 m
2
,
that is, the largest land lot is nearly two thousand times larger than the smallest one.
0
2
0
0
4
0
0
6
0
0
8
0
0
UP
5
e
+
0
1
5
e
+
0
2
5
e
+
0
3
5
e
+
0
4
AR
5
1
0
2
0
5
0
1
0
0
2
0
0
5
0
0
FR
Figure 1. Box-plots of UP, AR and FR.
Table 1. Descriptive statistics.
Variable Mean Median Standard error Minimum Maximum Range
UP 72.82 55.56 70.28 2.36 800.00 797.64
LAT 710100.00 710300.00 2722.34 701500.00 714600.00 13100.00
LON 8787000.00 8786000.00 6638.77 8769000.00 8798000.00 29000.00
AR 1355.00 300.00 6063.53 48.00 91780.00 91732.00
FR 18.13 10.00 30.54 2.60 516.00 513.40
In order to investigate how UP relates to some explanatory variables, we produced dis-
persion plots. Figure 2 contains the pairwise plots: (i) UP LAT; (ii) UP LON; (iii) log(UP)
log(AR); (iv) log(UP) log(FR); (v) UP ST and (vi) UP UC. It shows that there is
a direct relationship between UP and the corresponding regressor in (i), (ii), (v) and (vi),
whereas in (iii) and (iv) the relationship is inverse. Thus, there is a tendency for the land
lot unit price to increase with latitude, longitude, sector and also with the utilization co-
ecient, and to decrease as the area and the front size increase. We note that the inverse
relationship between unit price and front size was not expected. It motivated the inclusion
of the covariate FRVN in our analysis.
It is not clear from Figure 2 whether the usual assumptions of normality and homoskedas-
ticity are reasonable. As noted by Rigby and Stasinopoulos (2007), transformations of the
response variable and/or of the explanatory variables are usually made in order to mini-
mize deviations from the underlying assumptions. However, this practice may not deliver
the expected results. Additionally, the resulting model parameters are not typically easily
interpretable in terms of the untransformed variables. A more general modeling strategy is
thus called for.
4. Empirical Modeling
In what follows, we estimate the hedonic price function of land lots located in Aracaju
using the highly exible class of GAMLSS models. At the outset, however, we estimate
standard linear regression and generalized linear models. We use these ts as benchmarks
for our estimated GAMLSS hedonic price function.
702000 704000 706000 708000 710000 712000 714000
0
2
0
0
4
0
0
6
0
0
8
0
0
(i) UP x LAT
LAT
U
P
8770000 8775000 8780000 8785000 8790000 8795000
0
2
0
0
4
0
0
6
0
0
8
0
0
(ii) UP x LON
LON
U
P
4 6 8 10
1
2
3
4
5
6
(iii) log(UP) x log(AR)
log(AR)
lo
g
(
U
P
)
1 2 3 4 5 6
1
2
3
4
5
6
(iv) log(UP) x log(FR)
log(FR)
lo
g
(
U
P
)
5 10 15
0
2
0
0
4
0
0
6
0
0
8
0
0
(v) UP x ST
ST
U
P
3.0 3.5 4.0 4.5 5.0 5.5 6.0
0
2
0
0
4
0
0
6
0
0
8
0
0
(vi) UP x UC
UC
U
P
Figure 2. Dispersion plots.
4.1 Data modeling based on the CNLRM
Table 2 lists the classical normal linear regressions that were estimated. The transformation
parameter of the Box-Cox model was estimated by maximizing the prole log-likelihood
function:

= 0.1010. All four models are heteroskedastic and there is strong evidence of
nonnormality for the rst two models. The coecients of determination range from 0.54
to 0.66. Since the error variances are not constant, we present in Table 3 the estimated
parameters of Model E, which yields the best t, along with heteroskedasticity-robust
HC3 standard errors; see Davidson and MacKinnon (1993). Notice that all covariates are
statistically signicant at the 5% nominal level, except for LAT (p-value = 0.1263), which
suggests that pricing dierentiation mostly takes place as we move in the North-South
direction.
4.2 Hedonic GLM function
Table 4 displays the maximum likelihood t of the generalized linear model that we call
Model A given by
g(UP
) =
0
+
2
LON +
3
log(AR) +
4
UC +
5
log(ST) +
6
STR1 +
7
STR2
+
8
SI +
9
PA +
10
TO +
11
NIO +
12
NIT +
13
YR06 +
14
YR07
+
15
SZ +
16
log(FRVN), (2)
where UP
= IE[UP] = , UP Gamma(, ) and = log(). We try a number of

dierent models, and this one (gamma response and log link) yielded the best t. We also
note that all regressors are statistically signicant at the 1% nominal level, except for LAT
(p-value = 0.5295), which is why we dropped this covariate from the model.
Table 2. Fitted models via CNLRM.
Model Equation Considerations
B UP =
0
+
1
LAT +
2
LON +
3
AR +
4
UC +
5
ST+
6
STR1+
7
STR2+
8
SI+
9
PA+
10
TO+
11
NIO+
12
NIT+
13
YR06+
14
YR07+
15
SZ+
16
FRVN +
The null hypotheses that the errors are ho-
moskedastic and normal are rejected at the 1%
nominal level by the Breusch-Pagan and Jarque-
Bera tests, respectively. The explanatory variables
proved to be statistically signicant at the 1%
nominal level (z-tests). Also, R
2
= 0.539, AIC =
22304 and BIC = 22406.
C log(UP) =
0
+
1
LAT+
2
LON+
3
AR+
4
UC+
5
ST+
6
STR1+
7
STR2+
8
SI+
9
PA+
10
TO+
11
NIO+
12
NIT+
13
YR06+
14
YR07+
15
SZ+
16
FRVN +
The null hypotheses that the errors are ho-
moskedastic and normal are rejected at the 1%
nominal level by the Breusch-Pagan and Jarque-
Bera tests, respectively. All explanatory variables
proved to be statistically signicant at 1% the
nominal level (z-tests). Also, R
2
= 0.599, AIC =
2912 and BIC = 3014.
D log(UP) =
0
+
1
LAT +
2
LON +
3
log(AR) +
4
UC +
5
log(ST) +
6
STR1 +
7
STR2 +
8
SI +
9
PA +
10
TO +
11
NIO +
12
NIT +
13
YR06 +
14
YR07 +
15
SZ +
16
log(FRVN) +
The Jarque-Bera test does not reject the null hy-
pothesis of normality at the usual nominal lev-
els, but the Breusch-Pagan test rejects the null
hypothesis of homoskedasticity at the 1% nomi-
nal level. All explanatory variables are statistically
signicant at the 1% nominal level, except for the
LAT variable (p-value = 0.0190). Also, R
2
= 0.651,
AIC = 2619 and BIC = 2721.
E
UP
=
0
+
1
LAT +
2
LON +
3
log(AR) +
4
UC +
5
log(ST) +
6
STR1 +
7
STR2 +
8
PA +
9
TO+
10
NIO+
11
NIT+
12
YR06 +
13
YR07 +
14
log(FRVN) +
Normality is not rejected by the Jarque-Bera test,
but the Breusch-Pagan test rejects the null hy-
pothesis of homoskedasticity at the 1% nominal
level. All covariates proved to be statistically sig-
nicant at the 1% nominal level, except for the
LAT variable (p-value = 0.0881). Also, R
2
= 0.657,
AIC = 4290 and BIC = 4392.
Table 3. Hedonic price function estimated via CNLRM Model E.
Estimate Standard error z-statistic p-value
(Intercept) 162.6307 34.1920 4.756 0.0000
LAT 1.85e-05 1.21e-05 1.529 0.1263
LON 1.74e-05 4.60e-06 3.798 0.0001
log(AR) 0.3507 0.0192 18.236 0.0000
log(ST) 0.4423 0.0332 13.297 0.0000
UC 0.2651 0.0412 6.429 0.0000
STR1 0.4874 0.0717 6.789 0.0000
STR2 0.1678 0.0675 2.485 0.0130
SI 0.1119 0.0405 2.757 0.0058
PA 0.3853 0.0302 12.767 0.0000
TO 0.4905 0.0798 6.145 0.0000
NIO 0.5994 0.0592 10.131 0.0000
NIT 0.5111 0.0131 3.886 0.0000
YR06 0.2560 0.0351 7.289 0.0000
YR07 0.6450 0.0345 18.645 0.0000
SZ 0.7221 0.0474 15.239 0.0000
log(FRVN) 1.2041 0.0137 8.797 0.0000
4.3 GAMLSS hedonic fit
4.3.1 Location parameter modeling ()
Since UP (the response) only assumes positive values, we consider the distributions log-
normal (LOGNO), inverse Gaussian (IG), Weibull (WEI) and gamma (GA). As noted
earlier, we use pseudo-R
2
given by
pseudo-R
2
= [correlation (observed values of UP, predicted values of UP)]
2
, (3)
to measure the overall goodness-of-t.
Table 4. Hedonic price function estimated via GLM Model A given in Equation (2).
Estimate Standard error z-statistic p-value
(Intercept) 151.8019 15.7792 9.620 0.0000
LON 1.77e-05 1.80e-06 9.851 0.0000
log(AR) 0.2276 0.0108 21.120 0.0000
UC 0.1272 0.0231 5.515 0.0000
log(ST) 0.2880 0.0193 14.954 0.0000
STR1 0.3562 0.0395 9.021 0.0000
STR2 0.1419 0.0408 3.482 0.0005
SI 0.0945 0.0255 3.707 0.0002
PA 0.2324 0.0220 10.556 0.0000
TO 0.3139 0.0503 6.236 0.0000
NIO 0.4208 0.0348 12.087 0.0000
NIT 0.3779 0.0642 5.884 0.0000
YR06 0.1947 0.0242 8.035 0.0000
YR07 0.4551 0.0242 18.780 0.0000
SZ 0.4716 0.0310 15.220 0.0000
log(FRVN) 0.7467 0.0622 11.997 0.0000
Table 5. Fitted models via GAMLSS.
Model D G Equation Considerations
E LOGNO logarithmic UP =
0
+ cs(LAT) + cs(LON) +
cs(log(AR)) + cs(UC) + cs(ST) +
1
STR1 +
2
STR2 +
3
SI +
4
PA +
5
TO +
6
NIO +
7
NIT +
8
YR06 +
9
YR07 +
10
SZ + cs(log(FRVN))
All regressors are signicant at the
level 1% signicance level (z-tests).
Also, AIC = 19155, BIC = 19359
and GD = 19083. Pseudo-R
2
=
0.739.
F IG logarithmic UP =
0
1
STR1 +
2
STR2 +
3
SI +
4
PA +
5
TO +
6
NIO +
7
NIT +
8
YR06 +
9
YR07 +
10
SZ + cs(log(FRVN))
1% signicance level (z-test). Also,
AIC = 19845, BIC = 20048 and
GD = 19773. Pseudo-R
2
= 0.678.
G WEI logarithmic UP =
0
1
STR1 +
2
STR2 +
3
SI +
4
PA +
5
TO +
6
NIO +
7
NIT +
8
YR06 +
9
YR07 +
10
SZ + cs(log(FRVN))
All regressors proved to be signi-
cant at the 1% signicance level (z-
tests). Also, AIC = 19260, BIC =
19463 and GD = 19188. Pseudo-R
2
=
0.748.
H GA logarithmic UP =
0
1
STR1 +
2
STR2 +
3
SI +
4
PA +
5
TO +
6
NIO +
7
NIT +
8
YR06 +
9
YR07 +
10
SZ + cs(log(FRVN))
1% signicance level (z-tests). Also,
AIC = 19062, BIC = 19337 and
GD = 19134. Pseudo-R
2
= 0.746.
Table 6. Hedonic price function estimated via GAMLSS Model I.
Estimative Standard error z-statistic p-value
(Intercept) 165.4000 16.1300 10.251 0.0000
cs(LAT) 5.17e-05 6.22e-06 8.307 0.0000
cs(LON) 1.51e-05 2.13e-06 7.071 0.0000
cs(log(AR)) 0.2317 0.0096 24.074 0.0000
cs(ST) 0.0465 0.0037 12.416 0.0000
cs(UC) 0.1223 0.0206 5.947 0.0000
STR1 0.3133 0.0349 8.963 0.0000
STR2 0.0926 0.0364 2.545 0.0100
SI 0.0920 0.0227 4.054 0.0000
PA 0.1891 0.0195 9.670 0.0000
TO 0.2662 0.0474 5.951 0.0000
NIO 0.4135 0.0395 13.362 0.0000
NIT 0.3485 0.0571 6.102 0.0000
YR06 0.1645 0.0215 7.632 0.0000
YR07 0.4358 0.0215 20.235 0.0000
cs(log(FRVN)) 0.6513 0.0569 11.443 0.0000
SZ 0.3875 0.0299 12.935 0.0000
The models listed in Table 5 include smoothing cubic splines (cs) with 3 eective df
for the covariates LAT, LON, log(AR), UC, ST and log(FRVN). Other smoothers (such as loess
and penalized splines), as well as dierent combinations of D (see Rigby and Stasinopoulos,
2007), such as BCPE, BCCG, LNO, BCT, exGAUSS, among others, and G, such as identity,
inverse, reciprocal, among others, were considered. However, they did not yield superior
ts. We also note that Model I yields the smallest values of the three model selection
criteria. Table 6 contains the a summary of the model t.
The use of three eective df in the smoothing functions delivered a good model t.
However, in order to determine whether a dierent number of eective df delivers superior
t, we used two criteria, namely: the AIC (objective) and visual inspection of the smoothed
curves (subjective); visual inspection aimed at avoiding overtting. We then arrived at
Model J. It also uses cubic spline smoothing (cs), but with a dierent number of eective
df in the smoothing functions; see Table 7. Notice that there was a considerable reduction
relative to Model I in the AIC, BIC and GD values (18822, 19212 and 18684, respectively)
and that there is a better agreement between observed and predicted response values.
Table 7. Hedonic price function estimated via GAMLSS Model J.
(Intercept) 130.1000 14.8100 8.787 0.0000
cs(LAT, df = 10) 5.92e-05 5.71e-06 10.354 0.0000
cs(LON, df = 10) 1.05e-05 1.96e-06 5.352 0.0000
cs(log(AR), df = 10) 0.2559 8.83e-03 28.963 0.0000
cs(ST, df = 8) 0.0373 3.44e-03 10.831 0.0000
cs(UC, df = 3) 0.1769 0.0188 9.370 0.0000
STR1 0.2571 0.0320 8.012 0.0000
STR2 0.0728 0.0334 2.180 0.0293
SI 0.1029 0.0208 4.940 0.0000
PA 0.1436 0.0179 7.999 0.0000
TO 0.1822 0.0410 4.436 0.0000
NIO 0.4173 0.0284 14.690 0.0000
NIT 0.3388 0.0524 6.462 0.0000
YR06 0.1373 0.0198 6.941 0.0000
YR07 0.4190 0.0197 21.190 0.0000
cs(log(FRVN), df = 10) 0.6599 0.0522 12.630 0.0000
SZ 0.5119 0.0275 18.613 0.0000
Figure 3 contains plots of the smoothed curves from Model J. The dashed lines are con-
dence bands based on pointwise standard errors. Panels (I), (II), (III), (IV), (V) and
(VI) reveal that the eects/impacts of LAT, LON, log(AR), ST, UC and log(FRVN) are typ-
ically increasing, increasing/decreasing, decreasing, increasing, increasing and increasing,
respectively, with increases in latitude, longitude, log area, socioeconomic indicator, utiliza-
tion coecient and log land front in highly priced neighborhoods. (Panel (II) alternately
shows local increasing and decreasing trends.) Some of these eects were also suggested by
the estimated coecients of the CNLRM and GLM models. Here, however, one obtains a
somewhat more exible global picture, as we see.
In Panel (I), one notices that as the latitude increases the contribution of the LAT
covariate between the 702000 and 709000 latitudes (approximately) neighborhoods that
belong to the expansion zone of the city is negative, whereas starting from position 709000
(approximately) South Zone and downtown area the price eect is positive. Additionally,
we note that, in certain ranges, increases in latitude lead to drastic changes in the slope of
the smoothed curve, e.g., between the 708000 and 710000 positions, whereas in other areas,
for instance between the 706000 and 708000 latitudes the Mosqueiro neighborhood, an
increase in latitude leads to an uniform negative eect.
Panel (II) shows that as longitude increases to position 8780000 the contribution of
the LON covariate is positive and nearly uniform, which almost exclusively covers observa-
tions relative to the Mosqueiro neighborhood. Starting at the 8785000 position there is a
remarkable change in the slope of the tted curve, which is triggered by the location of the
most upper class neighborhoods: from 8785000 to 8794000. After the 8794000 position, the
eect remains positive, but is decreasing and, eventually, it becomes negative.
We see in Panel (III) that as the area (in logs) increases the contribution of the log(AR)
covariate, for land lots with log areas between 4 and 5 (respectively), is clearly positive.
702000 704000 706000 708000 710000 712000 714000
1
.
0
0
.
6
0
.
2
0
.
2
Panel (I)
LAT
c
s
(
L
A
T
,

d
f
=
1
0
)
8770000 8775000 8780000 8785000 8790000 8795000
0
.
8
0
.
4
0
.
0
0
.
4
Panel (II)
LON
c
s
(
L
O
N
,

d
f
=
1
0
)
4 6 8 10
1
.
5
0
.
5
0
.
5
Panel (III)
log(AR)
c
s
(
l
o
g
(
A
R
)
,

d
f
=
1
0
)
5 10 15
0
.
0
0
.
5
1
.
0
Panel (IV)
ST
c
s
(
S
T
,

d
f
=
8
)
3.0 3.5 4.0 4.5 5.0 5.5 6.0
0
.
1
0
.
1
0
.
3
Panel (V)
UC
c
s
(
U
C
,

d
f
=
3
)
0.0 0.5 1.0 1.5 2.0
0
.
0
0
.
5
1
.
0
Panel (VI)
log(FRVN)
c
s
(
l
o
g
(
F
R
V
N
)
,

d
f
=
1
0
)
Figure 3. Smoothed additive terms Model J.
The eect is negative for land lots with log areas in excess of 5.
In Panel (IV), it is possible to notice that as we move up in the socioeconomic scale the
contribution of the ST covariate, in the range from 1 to 4 minimum wages, is negative,
even though the there is an increasing trend. For land lots located in neighborhoods that
correspond to more than 4 minimum wages, the eect is always positive; from 10 to 15
minimum wages the eect is uniform.
We note from Panel (V) that, contrary to what one would expect, the contribution of
the UC covariate is not positive. In the range from 3.0 to 5.0, the tted curve displays small
oscillations, alternating in the positive and negative regions. The positive eect only holds
for utilization coecients greater than 5.0.
Notice from Panel (VI) that as the front land lot (in logs) increases in highly priced neigh-
borhoods the contribution of the log(FRVN) covariate is mostly increasing and positive.
However, in the 1.5 to 2.0 interval the positive eect is approximately uniform.
4.4 Comparing models
In order to compare the best estimated models via CNLRM (Model E), GLM (Model A
given in Equation (2)) and GAMLSS (Model J) we use the AIC and BIC. Note the criteria
only is used to compare models that use the response (UP) in the same measurement scale,
i.e., Models A and J. We also compare the dierent models using the pseudo-R
2
given in
Equation (3).
We present in Table 8 a comparative summary of the three models. We note that Model
J is superior to the two competing models. Not only it has the smallest AIC and BIC
values (in comparison to Model A, but it also has a much larger pseudo-R
2
. The GAMLSS
pseudo-R
2
exceeds 0.80, which is notable.
Table 8. Comparative summary of the CNLRM, GLM and GAMLSS estimated models.
Model Class AIC BIC Pseudo-R
2
E (CNLRM) 4290 4392 0.667
A (GLM) 19486 19581 0.672
J (GAMLSS) 18822 19212 0.811
4.5 Dispersion parameter modeling ()
After a suitable model for the prediction of was selected, we carried out a likelihood ratio
test to determine whether the GAMLSS scale parameter is constant for all observations.
The null hypothesis that is constant was rejected at the usual nominal levels. We then
built a regression model for such a parameter. To that end, we used stepwise covariate
selection, considered dierent link functions (such as identity, inverse, reciprocal, etc.) and
included smoothing functions (such as cubic splines, loess and penalized splines) in the
linear predictor, just as we had done for the location parameter. We used the AIC for
selecting the smoothers and for choosing the number of df of the smoothing functions
together with visual inspection of the smoothed curves.
We present in Table 9 the GAMLSS hedonic price function parameter estimates obtained
by jointly modeling the location () and dispersion () eects; Model K. The model uses
the gamma distribution for the response and the log link function for both and . We
note that Model K contains parametric and nonparametric terms, and for that reason it is
said to be a linear additive semiparametric GAMLSS.
We note from Table 9 that the parameter estimates of the location submodel in Model K
are similar to the corresponding estimates from Model J, in which was taken to be con-
stant; see Table 7. It is noteworthy, nonetheless, that there was a sizeable reduction in the
AIC, BIC and GD values (18607, 19065 and 18445, respectively) and also an improvement
in the residuals as evidenced by the worm plot; see Figures 4 and 5.
Only two covariates were selected for the regression submodel in Model K, namely: ST
and log(AR). The former (ST) entered the model in the usual parametric fashion whereas
the latter (log(AR)) entered the model nonparametrically through a cubic spline smoothing
function with ten eective df. We note that the positive sign of the log(AR) coecient indi-
cates that the UP dispersion is larger for land lots with larger areas whereas the negative sign
of the ST coecient indicates that the dispersion is inversely related to the socioeconomic
neighborhood indicator.
It is noteworthy that the pseudo-R
2
of Model K is quite high (0.817) and that all of
explanatory variables are statistically signicant at the 1% nominal level which is not all
that common in large sample cross sectional analyses, especially in real estate appraisals.
Overall, the variable dispersion GAMLSS model is clearly superior to the alternative mod-
els. The good t of Model K can be seen in Figure 6 where we plot the observed response
values against the predicted values from the estimated model. Note that the 45
o
line in
this plot indicates perfect agreement between predicted and observed values.
Table 9. Hedonic price function estimated via GAMLSS Model K.
Coecients
(Intercept) 95.1300 14.2700 6.665 0.0000
cs(LAT, df = 10) 5.94e-05 5.37e-06 11.053 0.0000
cs(LON, df = 10) 6.45e-06 1.86e-06 3.460 0.0000
cs(log(AR), df = 10) 0.2087 0.0104 20.138 0.0000
cs(ST, df = 8) 0.0321 0.0030 10.666 0.0000
cs(UC, df = 3) 0.2095 0.0161 13.006 0.0000
STR1 0.2039 0.0298 6.838 0.0000
STR2 0.0729 0.0276 2.635 0.0084
SI 0.7136 0.0192 3.705 0.0000
PA 0.1653 0.0157 10.465 0.0000
TO 0.1778 0.0370 4.799 0.0000
NIO 0.3722 0.0251 14.799 0.0000
NIT 0.2790 0.0468 5.957 0.0000
YR06 0.1255 0.0175 7.144 0.0000
YR07 0.4195 0.0177 23.622 0.00
cs(log(FRVN), df = 10) 0.6809 0.0403 16.88 0.0000
SZ 0.4824 0.0241 20.001 0.0000
coecients
(Intercept) 1.6838 0.0839 20.072 0.0000
cs(log(AR), df = 10) 0.1370 0.0143 9.593 0.0000
ST 0.0391 0.0040 9.632 0.0000
q
qq
q
q q q q q
q
q
q
q q
q q
q
q qq q
q
q
q
q
q
qq q
q
q q
q q
q q
q
q
q
q q q qqq q qq
q qq q q
q q
q
q q q q q qq q q q q q q
q q
q
q
qq
q
q q qq
q
q
q q
q
q
q q
qq q q
q q
q q
q
q
q q
q q q
q q q q
q
q
q
q
q q
q
q q q q q
q
q q
q q
q
q
q
q q
q
q qq
q
q
q
q q
q q q q
q
qqq q qq q q
q q
q
qqq
q q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q q q q q q q
q
q q q
q
q
q
q
q
q
q
q q
q qq q
q
q
q
q
q q q q
qq
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q q q
q
q q
q q
q
qq q
q
q
q
q q
q q
q q
q
q q q q
q q q q q q
q q
q
q q q q q
q q
q
q q qqq q
qq
q
q q
q q q
q
q
q q q q q
q
q q
q
q q q
q
q
q
q
q
q
q
q q
qq
q q
q
q
q
q qq
q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q
q q
q
q q
q
q qq q q q q q
q q
q
q q q q q
q
q
q
qq
q
q
q
q
q
q q
q
q
q
q q
q
q
q
q qq
q
q q
q
q q
q
qq q
q
q
qq qq
q
q
q
qq
q
q q q
q q
q
qqqq
qq
q
q
q
q qq q
q
qq
q
q q q q
q q
q
q qq qqq q q qq q q q q q q
q
q
q q
qq
q
q qq
q
q
q q
q
q
q
q q q q
q q
q
q q qq qqq
q
qq q qqq q
q q q
q
q
q q
q
q
q
q qq q
q
q
qq q q q
q
q q
q
q
q
q q
q
qq
q
q q q
q q
q
q
q q q q q q q q q q
q
q q q q q q
qqqq
q q q q q qq
q q
q q qq q
q
q
q
q
q q
q
q q q
q q
q
q
q
q
q q q
q
q
q qqqq q q
q q q
q
q
q q
q
q
q
q q
q
q
q
q
q q q
q
qq q
q
q
q
q
q
q
q
q
q
q q q
q
q
q q q q q
qq
q q q
qq
q
q
qq
q q
q q
q
q
q qq qq q
q
q
q
q
q
qq q q
qq q
q
q
q
q
q
q
q
q q
q
q
q q
q q
qq
q
q
qq
qq q qq q
q
q
q q
q q q q q q q
q
q
q
q
q
q q qqq
q
q
qq
qq
q
q
q
q
q
qq
qq
q
q
q
q q
q q q
q
q
q
q
qq
q q
q
q
q
q
q
q q q
q
q q q
q
q
q q
q q q q
q
qq
q q q
q
q q
q
q
q q
q q
q
q
q
q
q
q q
q
q q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q q
q
q qq
q
q
q
q
q
q
q
q q q
qq q q
q
q
q
q q
q
q q q q
q
q
q
q
q
q
q
q
q
q
q q
qq
q
q q
q
qq qq q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q q
q q
q
q
q
q
q
qq
q
q q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q qq q
q
q q
q
q
q
q
qq
q
q
q q q q q
q
q
q q
q q q
q
q q
q
q
q q
q q
q
q
q
q
q
qqq q
q
q
q
q q
q
q
q q q
q
q
q q
q
q
q
q
q
q
q
q qq
q
q
q q
q
q
q
q
q
q q
q
q
q q
q
q q
q q
q
q
q
q
q
q
q q
q q
q q q
q
q q
q
qq
q
q q q
q q q q q q
q
q
qq q q q
q q qq q q q q q q qq qq q q q q q q q
q
q q
q
q q q qq q q q q q q qq q q q q
q q
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q
q q
q
q q
q
q
q
q q q
q
q
q
q
q
q
q
q
q
q
q q q q
q
q
q q
q q
q
q q
q q q q q q
q q
q
q
q
q qq
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q q
q qq q q q
q q q q q q
q
q
q
q
q
q q
q
q
q q
q
q q q q
q
q q qq
q
q
q
q q
q q
q
q
q
q q q
q
q
q
q
q
q q q qq
q q
q q q q q
q
q
q q
q
q
q q q q
q
q q
q
q
q qq
q q q q q q q q
q
q
q
q
q
q
q
q
q
q
q q q q
q
qq q q
q q q q q
q q q q
q q q
q
q q q
q
qq q q
q
q
q q
q
q q
q q
q
q
q q
q
q
q
q
q q
q
q
q
q
q
q q q q
q
q q q q q q
q
q q
q
q
q q q
q q
q
q q
q
q
q qq
q
q q
q
q q
q
q
q
q q q q
q q
q q
q
q
q q
q
q q q q
q q q qq qq
q
q
q
q
q
q q
q
q
q
q
q
q q
q
q
q
q qq q q q
q
q
q
q q
q
q
q q
q q
q
q
q q q q q q
q
q
q
q
qq
q q
q
q
q
q
q q
q
q
qq
q
q
q
q
q q
q q q
q q q
qqq
qq q q q
q q
q q
q
q qq qq q q
q
q
q
q
q q
q
q
q
qq
q
q
q
q
q
q q
q q q q q qq q
q
q q
q q q q q
qq
q
q
q
q
q qq
q q
q
q q
q q q q q
q q q q q q q
q q q
q
q
q
q q q
q
q
q
q
q
q
q qq
q
q
q
q q q q
qq q
q
q q
qq
q qq
q qq
q
q
q
q q q
q
q q q q q
q
q
qq
qq
qq q
qq q
q q
q q
q q qq
q
q
q
q
q q
q q q
q
q
qq q
q q qq
q
qq
q
q
q
q
q
q
q
q
q q q
q
q
q
q
q
q q
q
q
q q
q
q q q q q
q
qqq qq
qqq
q
q
qqq qq
q
q qq
qqqqqqqqqqqqq
q
q
q q
q
q
q
q q
q
q q
q
q
q q q
q
q q
q
q
q
q
q
q
q
q q q q
q q
q
q qq
q q q q
q
q q
q q q qq q q
q
q q
q
qq q
q q
q q
q
q
q q
q
q q
q
q q
q
q qqq q
q
q
qq
q q
q
q
q
q q
q q q q
qq
q
q
q
q q
q
q
q q
qq qq
q q
q
qq
q
q
q q q q
q q
q
q
q
q q q q q
q
q
q
q q qq qq
q
qq
q
q q q q
q
qq q q
q q
q
q qqq
q
q q q q q q q
q
qq
qq
qq qq qq
q
q
q
q q
q
qqqq qqqq
q
q
qq
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
qq
q
q q q q q q
q
q
q
q
q
q
qq q
q
q q q
q
q
q
q
q q
q
q q q q q q
q
q
q q
q qq
q
q
q
q
q q q q q
q q
q
q
q q
q
q
q
q
q
q
q q q q
q
q
q q
q
q
q q q
q q
q
q
q
q
q
q
q
q
q q qq qq q qq q q q q q q
q
q q q q q
qq
q q q
q
q qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q q q q
q
q
q q
q
q
q
q
q
q
q
q q
q q
q
q
q
q q qq q
q q q
4 2 0 2 4
1
.5
1
.0
0
.5
0
.0
0
.5
1
.0
1
.5
Unit normal quantile
D
e
v
ia
tio
n
Figure 4. Worm plot Model J.
Model K is given by
log() =
0
+ cs(LAT, df = 10) + cs(LON, df = 10) + cs(log(AR), df = 10) +
cs(UC, df = 3) + cs(ST, df = 8) +
1
STR1 +
2
STR2 +
3
SI +
4
PA +
5
TO +
6
NIO +
7
NIT +
8
YR06 +
9
YR07 +
10
SZ +
cs(log(FRVN), df = 10),
log() =
0
+
1
ST + cs(log(AR), df = 10),
in which the response (UP) follows a GA distribution with location and scale parameters
and , respectively. This model proved to be the best model for hedonic prices equation
estimation of urban land lots in Aracaju.
q
q
q
q
qq q q q q
q
q
q
q
q q qq qq q
q
q
q
q q
qq q
q
q q
q
q
qq
q
q
q
q q qqqq qq q q qq q q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q qq q
q qq
q q q
q
q
q q
q q
q
q qq
qq
q qq q
q
q
q
q
q
q q
q
q
q q q q
q
q q
q q q q q
q
q q q
q
q
q
q q q q
q
q q q q
q
q
q q
q q q q q q
q
q
qqq q q q q q
q
q q qq
q qq q q
q
q q
q
q
q
q q
q
q
q q q
q
q
q
q q
q
q
q
q
q q q
q
q
q q q q q q q q
q
q q
q
q
q
q q
q
q
q
q
q q qq
q
q
q
q q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
qq q
q q
q q q q
q
q q
q
q
q
q
q
q
q q q q
q
q q q
q
q q
q
q
q
q q q
q
q
q
q q
q
q
q q
q
q q q q
q q q q q q
q q q
q q qq q
q q
q
q q
qqq q
qq
q
q
q
q q q
q
q q
q
q q q
q
q q q q q
q
q
q
q
q
q
q
q
q
q qq q qq q
q q qq
q
q
q q
q
q q q
q q q q
q q q
q q
q q
q
q q
q
q q
q
qq q qq q
q q
q q
q
qq
q
q
q
q q
q
q
q
q
q
q q
q
q q
q
q
q
q
q
q qq
q
q q
q
q
q q q q
q q
q
qq qq q q q
qq
q
q q q q q q
qqq q
q q
q
q
qq qq q
q
qq
q
q
q
q
q
q q
q
q
qq qqq q
q
qq
q
q
q q q q q
q
q
q
qq
q
q qq q
q
q q
q
q
q
q q
q
q
q q
q
q q
qq
qqq
q
qq q qqq q
q
q q qq q
q
q
q
q
q
qq
q
q q q q q q
q
q q q
q
q
q
qq
q q
q
q q
q
q
q
q
q
q
q
q q q q
q
q
qq q q q
q
q q q q
qqqq q
q q q q qq
q
q
q
q qq
q
q q
q
q
q q
q
q
q
q q q
q
q
q
q
qq q
q
q
q q qqq q q
q
q
q
qq q q
q
q
q
q q
q
q
q q
q
q
q
q
qq
q q
q
q
q
q
q
q
q
q
q q
q q
q
q q q q q
qq
q qq
qq q
q
qq
q
q
q
q
q
q
q q q q q q
q
q
q
q q qq
q
q q q q
q q q
q q
q
q q
q q q q
q
q
q
qq
q
q
q
q qq q
qq
q
q
q q q
q
q
q
q
q q
q
q
q q q q
q q qqq
q q
qq
q q
q q q
q
q
qq
qq q q q
q
q q q q q q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q q q
q
q q
q
q q q q
q
qq q
q q q q q
q
q q q q q q
q
q
q
q
q q
q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q q q q
q q
q
q
q
q
q
q
q
q
q q q
q q
qq q q
q
q
q
q q
q qq q q
q q
q
q q q q q q q q q
q
q q
q
q
q
q
q
q q
q q
q q
q
q
q
q
q q
qq
q
q q q
q q
q q q
q
q
q
q
q
q
q q q q q
q
q
q q
q
q
q
q
q
q
q q q q q
q
q
q
q
q
qq
q
q qq q
q
q
qq
q
q
q q q qq
q q q
q q q
q
qq q
q
q q q
q
q
q
qq
q
q
q q
q q q
q
q
q
q
q q
q q q
q
q q q q
q q q
q
q
q
q
qq q q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q q
q
q
q
q q q
q q q q q q q q
q
q
q q q
q q q q
q q
q q q q
q
q
q q
q
q
q
q q q q
q
q q
q
q
q
q
q
q
q q q q q qq
q
q
q q q q q
q
q qq q
q
q q
q
q qq qq q q q q q q q
q
q q
q
q q q qq q q q q q q qq q q q
q
q q
q q
q
q q q q
q q q q q q q
q
q q q
q
qq q q q q q q
q
q
q
q
q
q q
q
q
q q q q q
q
q
q q
q
q
q q q q q qq q q q q
q
q q q qq
q
q q
q q q q qqq
q
qqq
q
qq q
qq q q q q q q q q q
q
q
q
q q
q q
q q
q q
q
q q q q
q
q q qq
q
q
q
q q q q
q
q
q
q q
q
q
q
q
q
q
q qq qq
q q
q q q q q
q
q
q q
q q
qqq q
q
q q
q
q
q q q
q q q q q q q q
q
q
q q q
q
q
q
q
q q
q q q q q q q
q q q q q q
qq q q
q q q
q
q q q
q
q
q q
q
q q
q q
q
q q
q q
q
q
q
qq
q
q q
q
q
q
q
q
q
q
q q
q q
q q q q q qq
q qq
q
q q q
q
q q
q
qq
q q
q q
q
q
q q
q
q q
q
q
q
q q
q
q
q q
q q
q
q
q
q
q
q q q q
q q
q qq qq
q
q
q
q
q
q
q
q
q
q
q
q q q
q
q q q
q q q q q
q
q
q
q
q
q
q
qq
q qq
q q
q q q q q
q
q
q
q
q q q q
q
q qq q
q
q
q
q q
q
q
q
q
q
q
q q q
q q q qqq q q q
q
q
q
q
q
q q
q q q qq q
q
q q q
q
q
q q
q
qq q
q
q q q
q
q q
q
q
qq q q q q
q q q
q q q q q
q q
q q
q
q
q
q
q
qq q
q q q qq q q q q
q q q q
q
q qq
q q q
q q q q
q
q q
q
q
q qq
q
q
q q q q q qqq
q
q q
qq
q
q q
q
q
q q
q q
q q
q q
q
q q q
q
q
q
qq qq
qq q
q
q
q
q q
q q q q
qq q q q q
q
q
q q q
q
q
qq
q
q q qq
q
qq
q
q
q q
q
q
q
q
q q q
q q q
q
q
q q
q
q q q
q
q q q q q q
qq q q
q qqq
q
q
qq
q qq
q
q qq qqqqqqqqqqqqq
q
q qq
q
q
q
q
q q
q
q
q
q q
q q
q
q q q
q q
q
q
q
q
q
q
q
q q
q q q
q q
q q q q q q q q q
q qq
q
q
q q
q
q qq q
q
q q
q
q
q
q q
q
q q
q
q q
q q
q
qq q
q
q
qq
q q q
q
q
q q
q q qq
qq q
q
q
q q
q
q
q
q
q q
qq
q
q
q
qq
q q q
q q
q q q
q
q
q
q
q q
q
q q
q
q
q qqq qq q q q
q
q q q
q
q qq
q
q
q
q
q q q q q
q
q q q q q q q
q
qq qq qq qq qq
q
q q
q
q q qqqq qqqq q
q
q q
q
q
q
q q
q
q q
q
q
q q q
q
q
q
qq
q
q q q q
q
q
q
q q
q
q
q
qq q q
q q q
q
q
q
q
q q q qq q q q q q q
q
q
q qq q q
q
q
q q q qq
q q
q q
q
q q q
q
q q
q q
q q
q q
q
q q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
q q qq qq q
q q q q q q q q
q
q q q q q
q
q
q
q
q
q
q qq
qq
q
q
q q
q
q
q
q
q
q
q
q
q q q q
q
q
q q q q
q
q q
q
q
q
q
q
q q
q
q
q
q
q
q q q q
q
4 2 0 2 4
1
.5
1
.0
0
.5
0
.0
0
.5
1
.0
1
.5
Unit normal quantile
D
e
v
ia
tio
n
Figure 5. Worm plot Model K.
0 200 400 600 800 1000
0
2
0
0
4
0
0
6
0
0
8
0
0
1
0
0
0
Predicted values of UP
O
b
s
e
r
v
e
d
v
a
lu
e
s
o
f U
P
Figure 6. Observed values predicted values of UP Model K).
5. Concluding Remarks
Real state appraisal is usually performed using the standard linear regression model or the
class of generalized linear models. In this paper, we introduced real state appraisal based
on the class of generalized additive models for location, scale and shape, GAMLSS. Such
a class of regression models provides a exible framework for the estimation of hedonic
price functions. It even allows for some conditioning variables to enter the model in a
nonparametric fashion. The model also accommodates variable dispersion and can be based
on a wide range of response distributions. Our empirical analysis was carried out using a
large sample of land lots located in the city of Aracaju (Brazil). The selected GAMLSS
model displayed a very high pseudo-R
2
(approximately 0.82) and yielded an excellent
t. Moreover, the inclusion of nonparametric additive terms in the model allowed for the
estimation of the hedonic price function in a very exible way. We showed that the GAMLSS
t was clearly superior to those based on the standard linear regression and on a generalized
linear model. We strongly recommend the use of GAMLSS models for real state appraisal.
Acknowledgements
L. Florencio acknowledges funding from Coordenao de Aperfeioamento de Pessoal de
Nvel Superior (CAPES), F. Cribari-Neto and R. Ospina acknowledge funding from Con-
selho Nacional de Desenvolvimento Cientco (CNPq). We thank three anonymous referees
for their comments and suggestions.
References
Akaike, H., 1983. Information measures and model selection. Bulletin of the International
Statistical Institute, 50, 277290.
Akantziliotou, C., Rigby, R.A., Stasinopoulos, D.M., 2002. The R implementation of gener-
alized additive models for location scale and shape. In Stasinopoulos, M., Touloumi, G.,
(eds.). Statistical Modelling in Society: Proceedings of the 17th International Workshop
on Statistical Modelling. Chania, Greece, pp. 7583.
Anglin, P., Gencay, R., 1996. Semiparametric estimation of hedonic price functions. Journal
of Applied Econometrics, 11, 633648.
Atkinson, A.C. (1985) Plots, Transformations and Regression. Oxford University Press,
New York.
Buuren, S., Fredriks, A.M., 2001. Worm plot: a simple diagnostic device for modeling
growth reference curves. Statistics in Medicine, 20, 12591277.
Clapp, J.M., Kim, H.J., Gelfand, A., 2002. Predicting spatial patterns of house prices using
LPR and Bayesian smoothing. Real Estate Economics, 30, 505532.
Cox, D.R., Snell, E.J., 1989. Analysis of Binary Data. Chapman and Hall, London.
Cribari-Neto, F., Zarkos, S.G., 1999. R: yet another econometric programming environ-
ment. Journal of Applied Econometrics, 14, 319329.
Davidson, R., MacKinnon, J.G., 1993. Estimation and Inference in Econometrics. Oxford
University Press, New York.
Dunn, P.K., Smyth, G.K. (1996). Randomised quantile residuals. Journal of Computational
and Graphical Statistics. 5, 236-244.
Eubank, R., 1999. Nonparametric Regression and Spline Smoothing. Second edition. Marcel
Dekker, New York.
Ferrari, S.L.P., Cribari-Neto, F., 2004. Beta regression for modelling rates and proportions.
Journal of Applied Statistics, 31, 799815.
Gencay, R., Yang, X., 1996. A forecast comparison of residential housing prices by para-
metric and semiparametric conditional mean estimators. Economic Letters, 52, 129135.
Hrdle, W., Mller, M., Sperlich, S., Werwatz, A., 2004. Nonparametric and Semiparamet-
ric Models. Springer-Verlag, Berlin.
Hastie, T.J., Tibshirani, R.J., 1990. Generalizeds Additive Models. Chapman and Hall,
London.
Ihaka, R., Gentleman, R., 1996. R: a language for data analysis and graphics. Journal of
Computational and Graphical Statistics, 5, 299314.
Iwata, S., Murao, H., Wang, Q., 2000. Nonparametric assessment of the eects of neigh-
borhood land uses on the residential house values. In Fomby, T., Carter, H.R., (eds.).
Advances in Econometrics: Applying Kernel and Nonparametric Estimation to Economic
Topics. JAI Press, New York.
Martins-Filho, C., Bin, O., 2005. Estimation of hedonic price functions via additive non-
parametric regression. Empirical Economics, 30, 93114.
McFadden, D., 1974. Conditional logit analysis of qualitative choice behavior. In Zarembka,
P., (ed.). Frontiers in Econometrics. Academic Press, New York, pp. 105142.
Pace, R.K., 1993. Non-parametric methods with applications to hedonic models. Journal
of Real Estate Finance and Economics, 7, 185204.
Rigby, R.A., Stasinopoulos, D.M., 2005. Generalized additive models for location, scale and
shape (with discussion). Applied of Statistics, 54, 507554.
Rigby, R.A., Stasinopoulos, D.M., 2007. Generalized additive models for location scale and
shape (GAMLSS) in R. Journal of Statistical Software, 23, 146.
Rosen, S., 1974. Hedonic prices and implicit markets: product dierentiation perfect com-
petition. Journal of Political Economy, 82, 3455.
Silverman, B.W., 1984. Spline smoothing: the equivalent kernel method. The Annals of
Statistics, 12, 896916.
Thorsnes, P., McMillen, D.P., 1998. Land value and parcel size: a semiparametric analysis.
Journal of Real Estate Finance and Economics, 17, 233244.
Vol. 3, No. 1, April 2012, 93110
Statistical Distributions
Research Paper
Discriminating between the bivariate generalized
exponential and bivariate Weibull distributions
Arabin Kumar Dey
1
and Debasis Kundu
2,
1
Department of Mathematics, IIT Gauhati, Gauhati, India
2
Department of Mathematics and Statistics, IIT Kanpur, Kanpur, India
(Received: 24 July 2011 Accepted in nal form: 03 January 2012)
Abstract
Recently Kundu and Gupta (2009) introduced a bivariate generalized exponential dis-
tribution, whose marginals are generalized exponential distributions. The bivariate gen-
eralized exponential distribution is a singular distribution, similarly as the well known
bivariate Weibull distribution. The corresponding two singular bivariate distributions
functions have very similar joint probability density functions. In this paper, we consider
the discrimination between these two bivariate distribution functions. The dierence of
the maximized log-likelihood functions is used in discriminating between the two dis-
tribution functions. The asymptotic distribution of the test statistic has been obtained
and it can be used to compute the asymptotic probability of correct selection. Monte
Carlo simulations are performed to study the eectiveness of the proposed method. One
data set has been analyzed for illustrative purposes.
Keywords: Asymptotic distribution EM algorithm Likelihood ratio test
Maximum likelihood Monte Carlo simulations Probability of correct selection.
Mathematics Subject Classication: Primary 62H30 Secondary 62E20.
1. Introduction
Recently, the two-parameter generalized exponential (GE) distribution proposed by Gupta
and Kundu (1999) has received some attention. The two-parameter GE model, which has
one shape parameter and one scale parameter, is a positively skewed distribution. This
model has several desirable properties and many of them are very similar to the corre-
sponding properties of the well known Weibull distribution. For example, the probability
density functions (PDFs) and the hazard functions (HFs) of the GE and Weibull distribu-
tions are very similar. In addition, both distributions have compact cumulative distribution
functions (CDFs). These distributions contain the exponential distribution as a special
case. Therefore, they are extensions of the exponential distribution but in dierent man-
ners. It is further observed that the GE distribution can also be used quite successfully
in analyzing positively skewed data sets in place of the Weibull distribution. Moreover,
often it is very dicult to distinguish between these two distributions. For some recent
developments on the GE distribution, and for its dierent applications, the readers are
referred to the review article by Gupta and Kundu (2007).
Corresponding author. Debasis Kundu. Department of Mathematics, Indian Institute of Technology Kanpur, Kan-
pur 208016, India. Email: kundu@iitk.ac.in
94 A.K. Dey and D. Kundu
The problem of testing whether some given observations follow one of two (or more)
distributions is quite an old statistical problem. Cox (1961) (see also Cox, 1962) was
the pioneer in considering this problem. He also discussed the eect of choosing a wrong
model. Since then extensive work has been done in discriminating between two or more
distributions; see, e.g., Atkinson (1969, 1970), Bain and Englehardt (1980), Marshall et al.
(2001), Dey and Kundu (2009, 2010) and the references cited therein.
In recent times, it has been observed (see Gupta and Kundu, 2003, 2006) that, due
to the closeness between the Weibull and GE distributions, it is extremely dicult to
discriminate between their two corresponding CDFs. Note that if the shape parameter
is one, the two CDFs are not distinguishable. For small sample sizes, the probability of
correct selection (PCS) can be quite small, even if the shape parameter is not very close to
one. Interestingly, although extensive work has been done in discriminating between two
or more univariate distributions, but no work has been found in discriminating between
two bivariate distributions.
Recently, Kundu and Gupta (2009) introduced a singular bivariate distribution whose
marginals follow GE distributions, which is named as the bivariate generalized exponential
(BGE) distribution. The four-parameter BGE distribution has several desirable properties
and it can be used quite eectively to analyze bivariate data when there are ties. Another
well known four-parameter bivariate singular distribution is the bivariate Marshall-Olkin
Weibull (BMOW) distribution, which has been used quite eectively to analyze bivariate
data when there are ties; see, e.g., Kotz et al. (2000). The BMOW distribution has Weibull
marginals. Therefore, it is clear that for certain range of parameter values, the marginals
of the BGE and BMOW distributions are very similar. In fact, it is observed that the
shapes of the joint PDFs of the BGE and BMOW distributions can also be very similar
in nature.
In this paper, we consider discriminating between BGE and BMOW distributions. We
use the dierence of the values for maximized log-likelihood functions in discriminating
between the two CDFs. The exact distribution of the proposed test statistic is dicult to
obtain, and hence we obtain its asymptotic distribution. It is observed that the asymptotic
distribution of the test statistic is normally distributed and it is used to compute the PCS.
In computing the PCS, one needs to compute the misspecied parameters. Computation
of the misspecied parameters involves solving a four dimensional optimization problem.
We suggest an approximation, which involves solving an one dimensional optimization
problem only, which it computationally becomes very ecient. Monte Carlo simulations
are performed to study the eectiveness of the proposed method, and it is observed that,
even for moderate sample sizes, the asymptotic results match very well with the simulated
results.
Rest of the paper is organized as follows. In Section 2, we briey discuss about the
BGE and BMOW distributions. In Section 3, we present the discrimination procedure.
In Section 4, we provide the asymptotic distribution of the test statistics for both cases.
In Section 5, we discuss the calculation of the misspecied parameters. In Section 6, we
conduct Monte Carlo simulation study. In Section 7, we analyze a data set for illustrative
purposes. Finally, in Section 8 we conclude the paper.
2. BMOW and BGE Distributions
In this section, we briey discuss about the BMOW and BGE distributions. We use the
following notations throughout the paper. It is assumed that the univariate Weibull dis-
tribution with the shape parameter > 0 and the scale parameter > 0 has PDF, CDF
and survival function (SF) given by
f
WE
(x; , ) = x
1
e
x
, F
WE
(x; , ) = 1 e
x
, S
WE
(x; , ) = e
x
, x > 0, (1)
respectively. From now on a Weibull distribution with the PDF as given in Equation (1)
is denoted by WE(, ). The GE distribution, with the shape parameter > 0 and the
scale parameter > 0, has PDF given by
f
GE
(x; , ) = e
x
_
1 e
x
_
1
, x > 0. (2)
The corresponding CDF and SF are
F
GE
(x; , ) =
_
1 e
x
_
, and S
GE
(x; , ) = 1
_
1 e
x
_
,
respectively. A GE distribution with the PDF given in Equation (2) is denoted by GE(, ).
2.1 The BMOW distribution
Suppose U
0
WE(,
0
), U
1
WE(,
1
) and U
2
WE(,
2
) and they are indepen-
dently distributed. Dene X
1
= min{U
0
, U
1
} and X
2
= min{U
0
, U
2
}. Then, the bivariate
vector (X
1
, X
2
) has the BMOW distribution with parameters ,
0
,
1
,
2
, and it is de-
noted by BMOW(), where = (,
0
,
1
,
2
). Then, the (X
1
, X
2
) has joint SF of the
form
S
BMOW
(x
1
, x
2
; ) = P(X
1
> x
1
, X
2
> x
2
) = P(U
1
> x
1
, U
2
> x
2
, U
0
> z)
= S
WE
(x
1
; ,
1
)S
WE
(x
2
; ,
2
)S
WE
(z; ,
0
),
where z = max{x
1
, x
2
}. The joint PDF of (X
1
, X
2
) can be written as
f
BMOW
(x
1
, x
2
; ) =
_
_
f
1W
(x
1
, x
2
; ), if 0 < x
1
< x
2
;
f
2W
(x
1
, x
2
; ), if 0 < x
2
< x
1
;
f
0W
(x; ), if 0 < x
1
= x
2
= x;
where
f
1W
(x
1
, x
2
; ) = f
WE
(x
1
; ,
1
)f
WE
(x
2
; ,
0
+
2
),
f
2W
(x
1
, x
2
; ) = f
WE
(x
1
; ,
0
+
1
)f
WE
(x
2
; ,
2
),
f
0W
(x; ) =

0
0
+
1
+
2
f
WE
(x; ,
0
+
1
+
2
).
Note that the function f
BMOW
() may be considered to be a PDF for the BMOW distribu-
tion if it is understood that the rst two terms are PDFs with respect to two dimensional
Lebesgue measure, and the third term is a PDF with respect to a one dimensional Lebesgue
measure; see, e.g., Bemis et al. (1972). It is clear that the BMOW distribution has an abso-
lute continuous part on {(x
1
, x
2
); 0 < x
1
< , 0 < x
2
< , x
1
= x
2
}, and a singular part
on {(x
1
, x
2
); 0 < x
1
< , 0 < x
2
< , x
1
= x
2
}. The surface plot of the absolute contin-
uous part of the joint PDF has been provided in Figure 1 for dierent parameter values.
It is immediate that the joint BMOW PDF can take variety of shapes, and, therefore, it
can be used quite eectively in analyzing singular bivariate data.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(a)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0
0.5
1
1.5
2
2.5
3
3.5
4
(b)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
(c)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
(d)
Figure 1. Surface plots of the absolute continuous part of the joint PDF of BMOW for (,
1
,
2
,
3
): (a) (2.0, 1.0,
1.0, 1.0) (b) (5.0, 1.0, 1.0, 1.0) (c) (2.0, 2.0, 2.0, 2.0) (d) (1.0, 1.0, 1.0, 1.0).
The following probabilities are used later in deriving the asymptotic PCS. If (X
1
, X
2
)
BMOW(), then
p
1W
= P(X
1
< X
2
) =
_

0
_
y
0
f
WE
(x; ,
1
)f
WE
(y; ,
0
+
2
)dxdy
=

1
0
+
1
+
2
,
p
2W
= P(X
1
> X
2
) =
_

0
_

y
f
WE
(x; ,
0
+
1
)f
WE
(y; ,
2
)dxdy
=

2
0
+
1
+
2
,
p
0W
= P(X
1
= X
2
) =

0
0
+
1
+
2
_

0
f
WE
(z; ,
0
+
1
+
2
)dz
=

0
0
+
1
+
2
.
2.2 The BGE distribution
Suppose V
0
GE(
0
, ), V
1
GE(
1
, ) and V
2
GE(
2
, ). Dene Y
1
= max{V
0
, V
1
}
and Y
2
= max{V
0
, V
2
}. Then the bivariate random vector (Y
1
, Y
2
) is said to have the
BGE distribution with parameters
0
,
1
,
2
, , and it is denoted by BGE(), where =
(
0
,
1
,
2
, ). It is immediate that Y
1
GE (
0
+
1
, ) and Y
2
GE(
0
+
2
, ). The
0
0.5
1
1.5
2
2.5
3
3.5
0
0.5
1
1.5
2
2.5
3
3.5
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
(a)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0
0.02
0.04
0.06
0.08
0.1
0.12
(b)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
(c)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
(d)
Figure 2. Surface plots of the absolute continuous part of the joint BGE PDF for (
1
,
2
,
3
, ): (a) (1.0, 1.0, 2.0,
1.0) (b) (1.0, 1.0, 1.0, 4.0) (c) (5.0, 5.0, 5.0, 1.0) (d) (0.5, 0.5, 0.5, 1.0).
joint CDF of (Y
1
, Y
2
) can be expressed as
F
BGE
(y
1
, y
2
; ) = P(Y
1
y
1
, Y
2
y
2
) = P(V
1
y
1
, V
2
y
2
, V
0
v)
= (1 e
y
1
)
1
(1 e
y
2
)
2
(1 e
v
)
0
,
where v = min{y
1
, y
2
}. In this case, the joint CDF of Y
1
and Y
2
can be written as
f
BGE
(y
1
, y
2
; ) =
_
_
f
1G
(y
1
, y
2
), if 0 < y
1
< y
2
;
f
2G
(y
1
, y
2
), if 0 < y
2
< y
1
;
f
0G
(y), if 0 < y
1
= y
2
= y,
where
f
1G
(y
1
, y
2
; ) = f
GE
(y
1
;
0
+
1
, )f
GE
(y
2
;
2
, ),
f
2G
(y
1
, y
2
; ) = f
GE
(y
1
;
1
, )f
GE
(y
2
;
0
+
2
, ),
f
0G
(y; ) =

0
0
+
1
+
2
f
GE
(y;
0
+
1
+
2
, ).
It is clear that the BGE distribution has also a singular part and an absolute continuous
part similarly as the BMOW distribution. The surface plot of the joint BGE PDF is
provided in Figure 2 for dierent parameter values. It is clear that the shape of the joint
BGE and BMOW PDFs are very similar.
The following probabilities are needed later. If (Y
1
, Y
2
) BGE(), then
p
1G
= P(Y
1
< Y
2
) =
_

0
_
y
0
f
GE
(x;
0
+
1
, )f
GE
(y;
2
, )dxdy
=

2
0
+
1
+
2
,
p
2G
= P(Y
1
> Y
2
) =
_

0
_

y
f
GE
(x;
1
, )f
GE
(y;
0
+
2
, )dxdy
=

1
0
+
1
+
2
,
p
0G
= P(Y
1
= Y
2
) =

0
0
+
1
+
2
_

0
f
WE
(z;
0
+
1
+
2
, )dz
=

0
0
+
1
+
2
.
3. Discrimination Procedure
In this section, we present the discrimination procedure between the distributions. Speci-
cally, suppose {(X
11
, X
21
), . . . , (X
1n
, X
2n
)} is a random bivariate sample of size n generated
either from a BGE() distribution or from a BMOW() distribution. Based on the above
sample, we want to decide from which distribution the data set has been obtained. We use
the following notations and sets for the rest of the paper; I
0
= {(x
1i
, x
2i
), x
1i
= x
2i
= x
i
, i =
1, . . . , n}, I
1
= {(x
1i
, x
2i
), x
1i
< x
2i
, i = 1, . . . , n}, I
2
= {(x
1i
, x
2i
), x
1i
> x
2i
, i = 1, . . . , n},
I = I
0
I
1
I
2
, n
0
= |I
0
|, n
1
= |I
1
| and n
2
= |I
2
|, n
0
+ n
1
+ n
2
= n. It is assumed
that n
0
= 0, n
1
= 0, and n
2
= 0. Let

= (
0
,
1
,
2
,
) be the maximum likelihood

(ML) estimators of , based on the assumption that the data have been obtained from
the BGE() distribution. Similarly, let

= ( ,
0
,
1
,
2
) be the ML estimator of based
on the assumption that the data have been obtained from the BMOW() distribution.
Note that (
0
,
1
,
2
,
) and ( ,
0
,
1
,
2
) are obtained by maximizing the corresponding
log-likelihood function, say L
1
(
0
,
1
,
2
, ) and L
2
(,
0
,
1
,
2
), respectively. Note that
here the log-likelihood function of the BGE distribution can be written as
L
1
() = (n
0
+ 2n
1
+ 2n
2
)log () +n
1
log (
0
+
1
) +n
1
log(
2
) +n
2
log (
1
)
+n
2
log (
0
+
2
) + (
0
+
1
1)
iI1
log (1 e
x
1i
) + (
2
1)
iI1
log (1 e
x
2i
)
+(
1
1)
iI2
log (1 e
x
1i
) + (
0
+
2
1)
iI2
log (1 e
x
2i
) +n
0
log (
0
)
+(
0
+
1
+
2
1)
iI
0
log (1 e
x
i
)
_
iI
0
x
i
+
iI
1
I
2
x
1i
+
iI
1
I
2
x
2i
_
, (3)
and the BMOW log-likelihood function can be written as
L
2
() = (n
0
+ 2n
1
+ 2n
2
)log () +n
1
log (
1
) +n
2
log (
2
) +n
0
log (
0
) +n
1
log (
0
+
2
)
+n
2
log (
0
+
1
) + ( 1)
_
iI0
log (x
1i
) +
iI1I2
log (x
2i
) +
iI0
log (x
i
)
_
1
_

iI
1
I
2
x
1i
+
iI
0
x
i
_
2
_

iI
1
I
2
x
2i
+
iI
0
x
i
_
0
_
iI2
x
1i
+
iI1
x
2i
+
iI0
x
i
_
.
We use the following discrimination procedure. Consider the statistic
T = L
2
( ,
0
,
1
,
2
) L
1
(
0
,
1
,
2
,
). (4)
If T > 0, we choose the BMOW distribution, otherwise we prefer the BGE distribution.
It may be mentioned that (
0
,
1
,
2
,
) and ( ,
0
,
1
,
2
) are obtained by maximizing
Equations (3) and (4) respectively. Computationally both are quite challenging problems.
To maximize directly these problems one needs to solve a four dimensional optimization
problem in each case. In both the cases the EM algorithm can be used quite eectively
to compute the ML estimators of the unknown parameters; see e.g., Kundu and Gupta
(2009) and Kundu and Dey (2009) for the BGE and BMOW distributions, respectively.
In each case, it involves solving just a one-dimensional optimization problem at each E
step, and both the methods work quite well. In the next section we provide the asymptotic
distribution of T, which helps to compute the asymptotic PCS.
4. Asymptotic Distributions
In this section, we provide the asymptotic distributions of the test statistics for both
cases and use the following notations. For any functions, f
1
(U) and f
2
(U), E
BGE
[f
1
(U)],
V
BGE
[f
1
(U)] and Cov
BGE
(f
2
(U), f
1
(U)) denote the mean of f
1
(U), the variance of f
1
(U),
and the covariance of f
1
(U) and f
2
(U) respectively, under the assumption the U
BGE(). Similarly, we dene E
BWE
[f
1
(U)], V
BWE
[f
1
(U)] and Cov
BWE
(f
2
(U), f
1
(U)) as the
mean of f
1
(U), the variance of f
1
(U) and the covariance of f
1
(U) and f
2
(U) respectively,
under the assumption that U BWE() (bivariate Weibull). We have the following two
main results.
Theorem 4.1 Under the assumption that data come from the BMOW(,
0
,
1
,
2
) dis-
tribution, the distribution of T as dened in Equation (4) is approximately normally
distributed with mean E
BMOW
[T] and variance V
BMOW
[T]. The expressions of E
BMOW
[T]
and V
BMOW
[T] are provided below.
Proof It is provided in Appendix.
Now we provide the expressions for E
BMOW
[T] and V
BMOW
[T]. We denote
lim
n
E
BMOW
[T]
n
= AM
BMOW
and lim
n
E
BMOW
[T]
n
= AV
BMOW
.
Therefore,
lim
n
1
n
E
BMOW
[T] = AM
BMOW
= E
BMOW
[log (f
BMOW
(X
1
, X
2
; )) log (f
BGE
(X
1
, X
2
;

))],
lim
n
1
n
V
BMOW
[T] = AV
BMOW
= V
BMOW
[log (f
BMOW
(X
1
, X
2
; )) log (f
BGE
(X
1
, X
2
;

))].
Note that both AM
BMOW
and AV
BMOW
cannot be obtained in explicit form. They have
to be obtained numerically and they are functions of p
1W
, p
2W
, p
3W
, and

. Moreover,
it should be mentioned that the misspecied parameter

as dened in Lemma 8.1 (see
Appendix) also needs to be computed numerically.
Theorem 4.2 Under the assumption that data come from the BGE() distribution, the
distribution of T as dened in Equation (4) is approximately normally distributed with
mean E
BGE
[T] and variance V
BGE
[T]. The expressions of E
BGE
[T] and V
BGE
[T] are provided
below.
Proof It is provided in Appendix.
Now we provide the expressions for E
BGE
[T] and V
BGE
[T]. In this case, we denote
lim
n
E
BGE
[T]
n
= AM
BGE
and lim
n
V
BGE
[T]
n
= AV
BGE
.
Therefore,
lim
n
1
n
E
BGE
[T] = AM
BGE
= E
BGE
[log (f
BMOW
(X
1
, X
2
;
)) log(f
BGE
(X
1
, X
2
; ))],
lim
n
1
n
V
BGE
[T] = AV
BGE
= V
BGE
[log (f
BMOW
(X
1
, X
2
;
)) log (f
BGE
(X
1
, X
2
; ))].
As mentioned, here also both AM
BGE
and AV
BGE
cannot be obtained in explicit form.
They have to be obtained numerically and they are also functions of p
1G
, p
2G
, p
3G
,

and
. The misspecied parameter

as dened in Lemma 8.2 (see Appendix) also needs to be
computed numerically.
Then, based on the corresponding asymptotic distributions, it is possible to compute
the PCS for both the cases.
5. Misspecified Parameter Estimates
In this section, we discuss the estimation of the misspecied parameters.
5.1 Estimation of

In this case, it is assumed that the data have been obtained from the BMOW() dis-
tribution and we would like to compute

, the misspecied BGE parameters, as de-
ned in Lemma 8.1. Suppose (X
1
, X
2
) BMOW(). Consider the following events:
A
1
= {X
1
< X
2
}, A
2
= {X
1
> X
2
} and A
0
= {X
1
= X
2
}. Moreover, 1
A
is the indi-
cator function taking value 1 at the set A and 0 otherwise. Therefore,

can be obtained
as the argument maximum of E
BMOW
[log (f
BGE
(X
1
, X
2
; ))] =
1
() (say), where
1
() = log () +p
1W
log (
0
+
1
) +p
1W
log (
2
)
+(
0
+
1
1)E
BMOW
[log (1 e
X
1
) 1
A1
]
+(
2
1)E
BMOW
[log (1 e
X2
) 1
A
1
] E
BMOW
[(X
1
+X
2
) 1
A
1
]
+p
2W
log(
1
) + (
1
1)E
BMOW
[log (1 e
X
1
) 1
A
2
] +p
2W
log (
0
+
2
)
+(
0
+
2
1)E
BMOW
[log (1 e
X2
) 1
A2
] E
BMOW
[(X
1
+X
2
) I
A2
]
+(
0
+
1
+
2
1)E
BMOW
[log (1 e
X
) 1
A
0
] E
BMOW
[X 1
A
0
] +p
0W
log
0
.
We need to maximize
1
() with respect to for xed , to compute

, numerically.
Clearly,

is a function of , but we do not make it explicit for brevity. Since maximizing
1
() involves a four dimensional optimization process, we suggest to use an approximate
version of it, which can be performed very easily, and works quite well in practice. The
idea basically came from the missing value principle, and it has been used by Kundu and
Gupta (2009) in developing the EM algorithm. We suggest to use the following
1
(), the
pseudo version of
1
()
1
() = (p
0W
+u
2
p
1W
+w
2
p
2W
)log (
0
) + (p
0W
+ 2p
1W
+ 2p
2W
)log ()
+(
0
+
1
+
2
1)E
_
log (1 e
X
1
) 1
A0
_
(E[X
1
1
A0
] + E[(X
1
+X
2
) 1
A1A2
]) + (u
1
p
1W
+p
2W
)log (
1
)
+(w
1
p
2W
+p
1W
)log (
2
) + (
0
+
1
1)E
_
log (1 e
X1
) 1
A1
_
+(
0
+
2
1)E
_
log (1 e
X2
) 1
A
2
_
+ (
2
1)E
_
log(1 e
X2
) 1
A
1
_
+(
1
1)E
_
log (1 e
X
1
) 1
A
2
_
.
Here,
u
1
=

0
0
+
2
, u
2
=

2
0
+
2
, w
1
=

0
0
+
1
, w
2
=

1
0
+
1
, (5)
and p
1W
, p
2W
, p
3W
are same as dened before. The explicit expressions of the expected
values are provided in Appendix. Note that
1
() is actually
1
() = lim
n
1
n
E[l
pseudo
(
0
,
1
,
2
, | (X
1i
, X
2i
; i = 1, . . . , n)] .
Here l
pseudo
() is the pseudo log-likelihood function of the complete data set, as described
in Kundu and Gupta (2009). Moreover, it has the same form as in Kundu and Gupta
(2009), but since here it is assumed that (X
1i
, X
2i
) BMOW(,
1
,
2
,
3
), therefore the
expressions of u
1
, u
2
, w
1
, w
2
are as Equation (5), and they are dierent than Kundu and
Gupta (2009).
Now the maximization of
1
() can be performed as follows. Note that for a given ,
the maximization of
1
() with respect to
0
,
1
and
2
can occur at

0
() =
p
0W
+u
2
p
1W
+w
2
p
2W
E[log (1 e
X1
) 1
A0
] + E[log (1 e
X1
) 1
A1
] + E[log (1 e
X2
) 1
A2
]
,

1
() =
u
1
p
1W
+p
2W
E[log (1 e
X
1
) 1
A0
] + E[log (1 e
X
1
) 1
A1
] + E[log (1 e
X
1
) 1
A2
]
,

2
() =
p
1W
+w
1
p
2W
E[log (1 e
X
2
) 1
A
0
] + E[log (1 e
X
2
) 1
A
2
] + E[log (1 e
X
2
) 1
A
1
]
,
respectively, and nally maximization of
1
() can be obtained by maximizing prole
function, namely,
1
(
0
(),
1
(),
2
(), ) with respect to only. Therefore, it involves
solving an one dimensional optimization problem only.
5.2 Estimation of

In this case, it is assumed that the data have been obtained from the BGE() distribution
and we compute

, the misspecied BMOW parameters, as dened in Lemma 8.2. In
this case,

can be obtained as the argument maximum of E
BGE
[log(f
BMOW
(X
1
, X
2
; ))] =
2
() (say), where
2
() = (p
0G
+ 2p
1G
+ 2p
2G
)log ()
+p
1G
log (
1
) +p
2G
log (
2
) +p
0G
log (
0
) +p
1G
log (
0
+
2
)
+p
2G
log(
0
+
1
) + ( 1) (E
BGE
[log X
1
1
A
1
] + E
BGE
[log X
1
1
A
2
])
+( 1) (E
BGE
[log X
2
1
A
1
] + E
BMOW
[log X
2
1
A
2
] + E
BMOW
[log X
1
1
A
0
])
1
(E
BMOW
[X
1
1
A1
] + E
BMOW
[X
1
1
A2
] + E
BMOW
[X
1
1
A0
])
2
(E
BMOW
[X
2
1
A
1
] + E
BMOW
[X
2
1
A
2
] + E
BMOW
[X
1
1
A
0
])
0
(E
BMOW
[X
1
1
A
2
] + E
BMOW
[X
2
1
A
1
] + E
BMOW
[X
1
1
A
0
]) .
In this case, we need to maximize
2
() with respect to numerically to obtain

, for a
xed . Clearly,

depends on , and we do not make it explicit for brevity.
Similarly, as before since maximization of
2
() involves a four dimensional optimization
problem, we suggest to use the following approximation of
2
(). We suggest to use
2
() = (p
0G
+ 2p
1G
+ 2p
2G
)log + ( 1)E[logX
1
1
A0
+ (log X
1
+ log X
2
) 1
A1A2
]
0
E[X
1
1
A
0
+X
1
1
A
2
+X
2
1
A
1
] + (p
0G
+a
1
p
1G
+b
1
p
2G
) log (
0
)
1
E[X
1
] + (p
1G
+a
2
p
2G
) log (
1
)
2
E[X
2
] + (p
2G
+b
2
p
1G
) log (
1
).
Here
a
1
=

1
0
+
1
, a
2
=

0
0
+
2
, b
1
=

2
0
+
2
, b
2
=

0
0
+
2
,
p
0G
, p
1G
, p
2G
are same as dened before. The expressions of the dierent expectations are
provided in Appendix.
It may be similarly observed as before that
2
() = lim
n
1
n
E[l
pseudo
(,
0
,
1
,
2
| (X
1i
, X
2i
); i = 1, . . . , n)] ,
where (X
1i
, X
2i
) BGE(
0
,
1
,
2
, ). The explicit expression of l
pseudo
() is available in
Kundu and Dey (2009).
The maximization of
2
() with respect to can be performed quite easily. For xed
, the maximization
2
() with respect to
1
,
2
and
0
can be obtained for
1
=
p
1G
+b
2
p
2G
E[X
1
]
,
2
=
p
2G
+a
2
p
1G
E[X
2
]
,
0
=
p
0G
+a
1
p
1G
+b
1
p
2G
E[X
1
1
A0
] + E[X
1
1
A2
] + E[X
2
1
A1
]
,
respectively, and nally the maximization
2
() can be performed by maximizing the
prole function
2
(,
0
(),
1
(),
2
()) with respect to only.
6. Numerical Results
In this section, we perform some numerical experiments to observe how these asymp-
totic results work for dierent sample sizes, and for dierent parameter values. All these
computations are performed at the Indian Institute of Technology Kanpur, using Intel(R)
Core(TM)2 Quad CPU Q9550 2.83GHz, 3.23 GB RAM machines. The programs are writ-
ten in R software (2.8.1), which can be obtained from the authors on request. We compute
the PCS based on Monte Carlo (MC) simulation, and also based on the asymptotic results.
We replicate the process 1000 times and compute the proportion of correct selection. For
computing the PCS based on asymptotic results, rst we compute the misspecied param-
eters and based on those misspecied parameters we compute the PCS.
6.1 Case 1: parent distribution is BMOW
In this case, we consider the following parameter sets:
Set 1: = 2.0,
0
= 1.0,
1
= 1.0,
2
= 1.0; Set 2: = 1.5,
0
= 1.0,
1
= 1.0,
2
= 1.0;
Set 3: = 1.5,
0
= 0.5,
1
= 0.5,
2
= 0.5; Set 4: = 1.5,
0
= 2.0,
1
= 1.0,
2
= 1.5,
and dierent sample sizes namely n = 20, 40, 60, 80, 100. For each parameter set and for
each sample size, we have generated the sample from the BMOW distribution. Then, we
compute the ML estimates of the unknown parameters and the values for the corresponding
maximized log-likelihood functions, assuming that the data are coming from the BMOW
or BGE distribution. In computing the ML estimates of the unknown parameters, we have
used the EM algorithm as suggested in Kundu and Dey (2009) and Kundu and Gupta
(2009), respectively. Finally, based on the values for the corresponding maximized log-
likelihood functions, we decide whether we have made the correct decision or not. We
replicate the process 1000 times, and compute the proportion of correct selection. The
results are reported in the rst rows of Tables 1 to 4.
Now, to compare these results with the corresponding asymptotic results, rst we com-
pute the misspecied parameters for each parameter set, and they are presented in the
Table 1. PCS based on MC simulations and based on asymptotic distribution (AD) for parameter Set 1.
n 20 40 60 80 100
MC 0.9255 0.9808 0.9953 0.9987 0.9997
AD 0.9346 0.9837 0.9956 0.9987 0.9996
Table 2. PCS based on MC simulations and based on AD for parameter Set 2.
n 20 40 60 80 100
MC 0.9255 0.9808 0.9953 0.9987 0.9997
AD 0.9212 0.9772 0.9928 0.9976 0.9992
n 20 40 60 80 100
MC 0.9073 0.9749 0.9914 0.9979 0.9989
AS 0.9204 0.9767 0.9926 0.9975 0.9992
Table 4. PCS selection based on MC simulations and based on AD for parameter Set 4.
n 20 40 60 80 100
MC 0.8834 0.9587 0.9843 0.9952 0.9973
AS 0.8996 0.9648 0.9866 0.9947 0.9979
following Table 5. In each case, we need to compute AM
BMOW
and AV
BMOW
, as dened
in Theorem 4.1. Since the exact expressions of AM
BMOW
and AV
BMOW
are quite compli-
cated, we have used simulation consistent estimates of AM
BMOW
and AV
BMOW
, which can
be obtained very easily. The simulation consistent estimators of AM
BMOW
and AV
BMOW
are
obtained using 10,000 replications, and they are reported in Table 6.
Table 5. Misspecied parameter values

for dierent parameter sets.
Set
1

2

0

1 1.5098 1.5098 1.6228 2.81

2 0.8853 0.8853 0.9458 2.35
3 0.8908 0.8908 0.9600 1.49
4 1.3393 1.0782 0.7362 3.10
Table 6. AM
BMOW
and AV
BMOW
Set AM
BMOW
AV
BMOW
1 0.2346 0.4823
2 0.1982 0.3936
3 0.2297 0.4317
4 0.1762 0.4317
Now, using Theorem 4.1 and based on the asymptotic distribution of T and the dis-
crimination statistic, we compute the PCS, i.e., P(T > 0), for dierent sample sizes. The
results are reported in the second rows of Tables 1 to 4 for all the parameter sets. It is
very interesting to observe that for the bivariate case, even for small sample sizes the PCS
are very high, and the asymptotic results match very well with the simulated results.
6.2 Case 2: parent distribution is BGE
In this case, we consider the parameter sets:
Set 5:
0
= 1.5,
1
= 2.0,
2
= 1.0, = 1.0; Set 6:
0
= 1.0,
1
= 1.0,
2
= 1.0, = 1.0;
Set 7:
0
= 2.0,
1
= 2.0,
2
= 2.0, = 1.0; Set 8:
0
= 1.5,
1
= 1.5,
2
= 1.5, =
1.0, and the same sample sizes as in Case 1. In this case, we generate the sample from the
BGE distribution and using the same procedure as before we compute the proportion of
correct selection. The results are reported in the rst rows of Tables 7 to 10.
n 20 40 60 80 100
MC 0.9195 0.9797 0.9935 0.9986 0.9993
AS 0.9330 0.9830 0.9953 0.9986 0.9996
n 20 40 60 80 100
MC 0.9001 0.9701 0.9892 0.9962 0.9984
AS 0.9153 0.9741 0.9914 0.9970 0.9989
n 20 40 60 80 100
MC 0.9189 0.9811 0.9944 0.9987 0.9994
AS 0.9347 0.9837 0.9955 0.9987 0.9996
n 20 40 60 80 100
MC 0.9096 0.9768 0.9929 0.9975 0.9991
AS 0.9299 0.9816 0.9947 0.9984 0.9995
Now, to compute the asymptotic PCS, rst we compute the misspecied parameters
as suggested in Section 5, and they are reported in Table 11. We also report simulated
consistent estimates of AM
BGE
and AV
BGE
in Table 12.
Table 11. Misspecied parameter values

Set

2
5 1.6199 0.1732 0.1137 0.1992
6 1.4199 0.2575 0.2418 0.2418
3 1.8200 0.1123 0.1050 0.1050
8 1.6199 0.1665 0.1553 0.1553
Table 12. AM
BMOW
and AV
BMOW
Set AM
BMOW
AV
BMOW
5 0.2224 0.4406
6 0.1967 0.4095
3 0.2316 0.4692
8 0.2128 0.4157
Now, similarly as before, based on the asymptotic distribution of T, as provided in
Theorem 4.2, we compute the PCS in this case, i.e., P(T < 0), for dierent sample sizes.
We report the results in the second rows of Tables 7 to 10 for all the parameter sets. In
this case, it observed that the asymptotic results match extremely well with the simulated
results.
7. Data Analysis
In this section, we present the analysis of a real data set for illustrative purposes. These
data are from the National Football League (NFL), American Football, matches played on
three consecutive weekends in 1986. It has been originally published in Washington Post.
In this bivariate data set, the variables are the game time to the rst points scored by
kicking the ball between goal posts (X
1
) and the game time to the rst points scored by
moving the ball into the end zone (X
2
). These times are of interest to a casual spectator
who wants to know how long one has to wait to watch a touchdown or to a spectator who
is interested only at the beginning stages of a game. The data (scoring times in minutes
and seconds) are represented in Table 13. We have analyzed the data by converting the
seconds to the decimal minutes, i.e., 2:03 has been converted to 2.05.
Table 13. American Football League (NFL) data.
X
1
X
2
X
1
X
2
X
1
X
2
2:03 3:59 5:47 25:59 10:24 14:15
9:03 9:03 13:48 49:45 2:59 2:59
0:51 0:51 7:15 7:15 3:53 6:26
3:26 3:26 4:15 4:15 0:45 0:45
7:47 7:47 1:39 1:39 11:38 17:22
10:34 14:17 6:25 15:05 1:23 1:23
7:03 7:03 4:13 9:29 10:21 10:21
2:35 2:35 15:32 15:32 12:08 12:08
7:14 9:41 2:54 2:54 14:35 14:35
6:51 34:35 7:01 7:01 11:49 11:49
32:27 42:21 6:25 6:25 5:31 11:16
8:32 14:34 8:59 8:59 19:39 10:42
31:08 49:53 10:09 10:09 17:50 17:50
14:35 20:34 8:52 8:52 10:51 38:04
The variables X
1
and X
2
have the structure: (i) X
1
< X
2
means that the rst score is
a eld goal, (ii) X
1
= X
2
means the rst score is a converted touchdown, (iii) X
1
> X
2
means the rst score is an unconverted touchdown or safety. In this case, the ties are
exact because no game time elapses between a touchdown and a point-after conversion
attempt. Therefore, it is clear that, in this case, X
1
= X
2
occurs with positive probability,
and some singular distribution should be used to analyze this data set.
If we dene the random variables
U
1
= time to rst eld goal,
U
2
= time to rst safety or unconverted touchdown,
U
0
= time to rst converted touchdown.
Then, X
1
= min{U
0
, U
1
} and X
2
= min{U
0
, U
2
}. Therefore, (X
1
, X
2
) has a similar struc-
ture as the bivariate Marshall-Olkin exponential model. Csorgo and Welsh (1989) analyzed
the data using the bivariate Marshall-Olkin exponential model but concluded that it does
not work well, because X
2
may be exponential but X
1
is not. In fact it is observed that
the empirical HFs of both X
1
and X
2
are increasing functions.
Since both BMOW and BGE distributions can have increasing marginal HFs, we t both
the models to the data set. For the BMOW distribution, using EM algorithm as suggested
in Kundu and Dey (2009), we compute the ML estimates of the unknown parameters as =
0
0.02
0.04
0.06
0.08
0.1
0.12
10 5 0 5 10 15 20
Figure 3. Histogram of the bootstrap sample of the discrimination statistic.
1.2889,

0
= 11.2073,

1
= 8.3572,

2
= 0.4720, and the associated 95% condence intervals
are (1.0372, 1.5406), (5.7213, 16.6932), (2.5312, 14.1831), (-0.4872, 1.4314) respectively.
The value for the corresponding maximized likelihood function is 47.8041. In case of the
BGE distribution using the EM algorithm as suggested in Kundu and Gupta (2009), we
obtained the ML estimates of the unknown parameters as
0
= 1.1628,
1
= 0.0558,
2
= 0.5961,

= 9.5634, and the associated 95% condence intervals are (0.6991, 1.6266),
(-0.0205, 0.1322), (0.2751, 0.9171) and (6.5298, 12.5970) respectively. The value for the
corresponding maximized likelihood function is 38.0042. Therefore, based on the values
for the corresponding maximized likelihood function, we prefer to use the BMOW model
rather than the BGE model to analyze this data set.
Now, to compute the PCS in this case, we perform non-parametric bootstrap. The
histogram of the bootstrap sample of the discrimination statistic is provided in Figure 3.
Based on one thousand bootstrap replications, it is observed that the PCS is 0.98.
8. Conclusion
In this paper, we have considered discrimination between two singular bivariate models,
namely the BMOW and BGE distributions. Both the distributions have singular part and
absolute continuous part. The dierence of the values for the corresponding maximized
likelihood function has been used as the discrimination statistic. We have obtained the
asymptotic distribution of the discrimination statistic, which can be used to compute
the asymptotic PCS. MC simulations are performed to see the behavior of the proposed
method. It is known that the discrimination between Weibull and generalized exponential
distributions is quite dicult (see Gupta and Kundu, 2003), but in this paper it is observed
that the discrimination between the BMOW and BGE distributions is relatively easier.
Even with small sample sizes the PCS quite high. Moreover, the asymptotic PCS matches
very well with the simulated PCS even for moderate sample sizes. We have performed
the analysis of a data set and computed the PCS using non-parametric bootstrap method.
Although we do not have any theoretical results, it seems non-parametric bootstrap method
also can be used quite eectively in computing the PCS in this case. More work is needed
in this direction.
Appendix
To prove Theorem 4.1, we need of Lemma 8.1. Here
a.s.
means converges almost surely.
Lemma 8.1 Under the assumption that data are from the BWE(,
0
,
1
,
2
) distribution,
as n , we have
(i)
a.s.
,

0
a.s.

0
,

1
a.s.

1
and

2
a.s.

2
where for = (,
0
,
1
,
2
),
E
BMOW
[log (f
BMOW
(X
1
, X
2
; ))] = max
E
BMOW
[log(f
BMOW
(X
1
, X
2
;

))];
(ii)
0
a.s.

0
,
1
a.s.

1
,
2
a.s.

2
,

a.s.

, where for = (
0
,
1
,
2
, ),
E
BMOW
[log(f
BGE
(X
1
, X
2
;

))] = max
E
BMOW
[log (f
BGE
(X
1
, X
2
; ))].
It may be noted that

may depend on , but we do not make it explicit for brevity;
(iii) If we denote
T
= L
2
(,
0
,
1
,
2
) L
1
(
0
,
1
,
2
,
),
then n
1
2
(T E
BMOW
[T]) is asymptotically equivalent to n
1
2
(T
E
BMOW
[T
]) .
Proof of Lemma 8.1 It is quite standard and it follows along the same line as the proof
of Lemma 2.2 of White (1982), and it is avoided.
Proof of Theorem 4.1 Using Central limit theorem and part (ii) of Lemma 8.1, it
follows that n
1
2
(T
E
BWE
[T
]) is asymptotically normally distributed with mean zero

and variance V
BMOW
[T
]. Therefore, using part (iii) of Lemma 8.1, the result immediately

follows.
To prove Theorem 4.2 and for dening the misspecied parameter

, we need of Lemma
8.2, whose proof is same as the proof of Lemma 8.1.
Lemma 8.2 Suppose the data follow the BGE(
0
,
1
,
2
, ) distribution, as n , we
have
(i)
0
a.s.

0
,
1
a.s.

1
,
2
a.s.

2
and

where
E
BGE
[log (f
BGE
(X
1
, X
2
; ))] = max
E
BGE
[log (f
BGE
(X
1
, X
2
;

))];
(ii)
a.s.
,

0
a.s.
0
,

1
a.s.
1
,

2
a.s.
2
, where

= ( ,
0
,
1
,
2
),
E
BGE
[log (f
BMOW
(X
1
, X
2
;
))] = max
E
BGE
[log (f
BMOW
(X
1
, X
2
; ))];
here also

depend on , but we do not make it explicit for brevity;
(iii) If we denote
T
= L
2
( ,
0
,
1
,
2
) L
1
(
0
,
1
,
2
, ),
then n
1
2
(T E
BGE
[T]) is asymptotically equivalent to n
1
2
(T
E
BGE
[T
]) .
Proof of Theorem 4.2 Along the same line as the Proof of Lemma 8.1, it also follows
using Lemma 8.2.
The following lemmas are useful in computing the dierent expected values needed in
1
() and in
2
(). Here 1
A0
, 1
A1
and 1
A2
are same as dened before.
Lemma A.1 Let W
0
GE(
0
+
1
+
2
, ), W
1
GE(
0
+
1
, ), W
2
GE(
0
+
2
, )
and (X
1
, X
2
) BGE(
0
,
1
,
2
, ). If g() is any Borel measurable function, then
E[g(X
1
) 1
A
1
] = E[g(W
1
)] +

0
+
1
0
+
1
+
2
E[g(W
0
)].
E[g(X
1
) 1
A
2
] =

1
0
+
1
+
2
E[g(W
0
)].
E[g(X
1
) 1
A0
] = E[g(X
2
) 1
A0
] =

0
0
+
1
+
2
E[g(W
0
)].
E[g(X
2
) 1
A
1
] =

2
0
+
1
+
2
E[g(W
0
)].
E[g(X
2
) 1
A2
] = E[g(W
2
)] +

0
+
2
0
+
1
+
2
E[g(W
0
)].
Proof of Lemma A.1 See Kundu and Gupta (2009).
Lemma A.2 Let Z
0
WE(,
0
+
1
+
2
), Z
1
WE(,
0
+
1
), Z
2
WE(,
0
+
2
)
and (X
1
, X
2
) BMOW(,
0
,
1
,
2
). If g() is any Borel measurable function, then
E[g(X
1
) 1
A1
] =

1
0
+
1
+
2
E[g(Z
1
)].
E[g(X
1
) 1
A2
] = E[g(Z
1
)]

0
+
1
0
+
1
+
2
E[g(Z
0
)].
E[g(X
1
) 1
A0
] = E[g(X
2
) 1
A0
] =

0
0
+
1
+
2
E[g(Z
0
)].
E[g(X
2
) 1
A1
] = E[g(Z
2
)]

0
+
2
0
+
1
+
2
.
E[g(X
2
) 1
A
2
] =

2
0
+
1
+
2
E[g(Z
2
)].
Proof of Lemma A.1 They can be obtained along the same line as in Lemma A.1.
References
Atkinson, A., 1969. A test for discriminating between models. Biometrika, 56, 337341.
Atkinson, A., 1970. A method for discriminating between models (with discussions). Jour-
nal of The Royal Statistical Society Series B - Statistical Methodology, 32, 323353.
Bain, L.J., Englehardt, M., 1980. Probability of correct selection of Weibull versus gamma
based on likelihood ratio test. Communications in Statistics - Theory and Methods, 9,
375381.
Bemis, B., Bain, L.J., Higgins, J.J., 1972. Estimation and hypothesis testing for the pa-
rameters of a bivariate exponential distribution. Journal of the American Statistical
Cox, D.R., 1961. Tests of separate families of hypotheses. In Proceedings of the Fourth
Berkeley Symposium on Mathematical Statistics and Probability. University of Califor-
nia Press, Berkeley, pp. 105123.
Cox, D.R., 1962. Further results on tests of separate families of hypotheses. Journal of The
Royal Statistical Society Series B - Statistical Methodology, 24, 406424.
Csorgo, S., Welsh, A.H., 1989. Testing for exponential and Marshall-Olkin distribution.
Journal of Statistical Planning and Inference, 23, 287300.
Dey, A.K., Kundu, D., 2009. Discriminating among the log-normal, Weibull and generalized
exponential distributions. IEEE Transactions on Reliability, 58, 416424.
Dey, A.K., Kundu, D., 2010. Discriminating between the log-normal and log-logistic dis-
tributions. Communications in Statistics - Theory and Methods, 39, 280292.
Gupta, R.D., Kundu, D., 1999. Generalized exponential distributions. Australian and New
Zealand Journal of Statistics, 41, 173188.
Gupta, R.D., Kundu, D., 2003. Discriminating between Weibull and generalized exponen-
tial distributions. Computational Statistics and Data Analysis, 43, 179196.
Gupta, R.D., Kundu, D., 2006. On the comparison of Fisher information of the Weibull
and GE distributions. Journal of Statistical Planning and Inference, 136, 31303144.
Gupta, R.D., Kundu, D., 2007. Generalized exponential distribution: existing methods and
recent developments. Journal of Statistical Planning and Inference, 137, 35373547.
Kotz, S., Balakrishnan, N., Johnson, N., 2000. Continuous Multivariate Distributions:
Models and Applications. Wiley and Sons, New York.
Kundu, D., Dey, A.K., 2009. Estimating the parameters of the Marshall-Olkin bivariate
Weibull distribution by EM algorithm. Computational Statistics and Data Analysis, 53,
956965.
Kundu, D., Gupta, R.D., 2009. Bivariate generalized exponential distribution. Journal of
Multivariate Analysis, 100, 581593.
Marshall, A.W., Meza, J.C., Olkin, I., 2001. Can data recognize its parent distribution?
Journal of Computational and Graphical Statistics, 10, 555580.
White, H., 1982. Nonlinear regression on cross-section data. Econometrica, 48, 721746.
Call for Papers
The editorial board of the Chilean Journal of Statistics (ChJS) is seeking papers, which will be refereed. We encourage
the authors to submit a PDF electronic version of the manuscript to Victor Leiva, Executive Editor of the ChJS, to
victor.leiva@uv.cl and chjs.editor@uv.cl.
Manuscript Preparation
Submision
Manuscripts to be submitted to the ChJS must be written in English and contain the name and aliation of each
author and a leading abstract followed by keywords and mathematics subject classication (primary and secondary).
AMS classication is available from the ChJS website. Sections must be numbered 1, 2, etc., where Section 1 is the
introduction part. References should be collected at the end of the paper in alphabetical order as in the following
examples:
Rukhin, A.L., 2009. Identities for negative moments of quadratic forms in normal variables. Statistics and Probability
Letters, 79, 1004-1007.
Stein, M.L., 1999. Statistical Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York.
Tsay, R. S., Pe na, D., Pankratz, A. E., 2000. Outliers in multivariate time series. Biometrics, 87, 789-804.
References in the text should be given by the authors name and year of publication, e.g., Gelfand and Smith (1990).
In the case of more than two authors, the citation must be written as Tsay et al. (2000).
Acceptance
Once the manuscript has been accepted for publication in the ChJS, the authors must prepare the nal version
following the above indications and using the Latex format. Latex template and chjs class les for manuscript
preparation are available from the ChJS website.
Copyright
Authors who publish their articles in the Chilean Journal of Statistics automatically transfer their copyright to the
Chilean Statistical Society. This enables full copyright protection and wide dissemination of the articles and the
journal in any format.
The Chilean Journal of Statistics grants permission to use gures, tables and brief extracts from its collection of
articles in scientic and educational works, in which case the source that provides these issues (Chilean Journal of
Statistics) must be clearly acknowledged.

Journal Soche Vol 3.1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Journal Soche Vol 3.1

Uploaded by

Copyright:

Available Formats

Volume 3 - Number 1 - April 2012

Volume 3 - Number 1 - April 2012

Corresponding author. Departamento de Didactica de la Matematica. Facultad de Ciencias de la Educacion. Uni-

= (3, 0, 0.1, 0.7)

, which are the posterior means of the regression parameters

was assumed diagonal, V

and V , even when we xed a non-diagonal matrix for V , as-

> 0 such that

(h)(x) = h(x + ) h(x),

h)(x) and identically, for N N

arranged in the decreasing order

is always positive, and its unique positive square

. Then, for a xed value of L, the trace

in this case are the solution of

. However, their functional forms are very sophisticated in

) for each unit change in the window length.

and Slvia Maria de Freitas

b, and the marginal residuals:

. These predict respectively conditional error e = y E[y|b] = y X Zb,

b depend on b and e and thus

e has the least confounding fraction possible. The suggested

is the ML estimate when

and the normal curvature in the vicinity of

Corresponding author. Raydonal Ospina. Departamento de Estatstica, Universidade Federal de Pernambuco,

is a p-dimensional parameter vector. It is assumed to belong to a wide

be the vector of independent observations on the response

is a vector of p parameters associated with the explanatory variables

is a vector of parameters of length

= IE[UP] = , UP Gamma(, ) and = log(). We try a number of

) be the maximum likelihood

1 1.5098 1.5098 1.6228 2.81

]) is asymptotically normally distributed with mean zero

]. Therefore, using part (iii) of Lemma 8.1, the result immediately

You might also like